如何缓存万亿级别的html小文件

haiker

浏览: 296584 次

最近访客更多访客>>

mumume123

Luckdeng

mft8899

lijianfeng007

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

redis

redis

最近接到一个需求，需要对后面.Net产生的html静态文件做缓存，因为数量巨大，所以放弃了基于文件目录的存储方案，也是对.Net优化的重点。

常见于.Net开发的大型网站，应对高并发的情况。

缓存需要持久化，服务器意外重启后，不需要重新cache；

缓存在多台缓存服务器的内容是同步的，避免某台机器宕机造成的故障；

方案一采用lvs＋两台Nginx＋多台MongoDB＋多台Memcached

因为lvs自动把用户请求负载到两台Nginx，所以缓存不能依靠Nginx自身的proxy_cache

此方案由“多台MongoDB＋多台Memcached”组成两级Cache服务。

流程：

由.Net提交html文件至MongoDB，Nginx第一次读取文件取自MongoDB，缓存到Memcached，后续请求直接取自Memcached；

.Net更新/删除html文件至MongoDB，Nginx自动更新/删除Memcached；

存入memcached设置过期时间；

*考虑维护的工作量太大，简单采用了方案二，后期再过渡。

方案二采用lvs＋两台Nginx\TT

详情待续...

目标

访问http://abc.hostname.com/1151/31678.html

如果缓存里面没有则代理访问到http://abc.hostname.com/shop/product.aspx?sid=1151&pid=31678&r=yes

一旦请求动态地址，即生成缓存，下次访问http://abc.hostname.com/1151/31678.html直接由缓存响应，不再由windows.Net响应；

此外需要提供一个清除缓存的接口：http://abc.hostname.com/capi/shop/product.aspxsid=1151&pid=31678&r=yes

（清除缓存注意sid前面没有问号）

================== 方案一的步骤 ===================

安装libevent和memcached，便于找到key，tt不支持stats cachedump x 0

1.先安装libevent。这个东西在配置时需要指定一个安装路径，即./configure --prefix=/usr/local；然后make；然后make install；
2.再安装memcached，只是需要在配置时需要指定libevent的安装路径即./configure --with-libevent=/usr/local；然后make；然后make install；
这样就完成了Linux下Memcache服务器端的安装。

//测试libevent是否安装成功：
# ls -al /usr/local/lib | grep libevent

//测试是否成功安装memcached：
# ls -al /usr/local/bin/mem*

安装TT

省略

找key步骤

#在Nginx里面切换缓存到memcached
stats items
stats cachedump 27 0
get key
#为了解key的格式

安装Nginx和必要的扩展

参考

nginx version: rhosync/0.9.52
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-52)
TLS SNI support disabled
configure arguments: --user=www --group=www --prefix=/usr/local/webserver/nginx --with-http_stub_status_module --with-http_ssl_module --add-module=/data/software/htmlcache/ngx_http_upstream_keepalive-d9ac9ad67f45 --add-module=/data/software/htmlcache/agentzh-memc-nginx-module-a0bc33a --add-module=/data/software/htmlcache/agentzh-srcache-nginx-module-921d9b2

开始漫长的调试配置

参考了（飞去来器，coding也是游戏）
http://fff.iteye.com/blog/697952

http://blog.s135.com/nginx_cache

http://labs.frickle.com/nginx_ngx_slowfs_cache/

待压力测试后附具体配置的下载地址和注释webmaster (#) rhomobi.com

keepalive single扫盲

keepalive

syntax keepalive num [single]

context upstream

Enables keep-alive connections for the upstream.

Num specifies the max number of connections to keep open before, if the max is reached it will close the least recently used connections.

Single treats everything as a single host. With this flag connections to different backends are treated as equal.

This module was tested to work with standard round-robin balancing, but it's believed to be compatible with more sophisticated balancers. The only requirement is to activate them before this module, e.g:

upstream htmlcache {
  server 10.0.0.1:11211;
  server 10.0.0.2:11211;
  ip_hash;
  keepalive 512;
}

HEADS UP: Description below is obsolete and needs editing. Keepalive connections to upstream servers are in the main code since 1.1.4, including the latest stable branch of 1.2.x. Check the documentation at nginx.org.

TT优化

大概测算按照现在每个缓存数据10K的大小计算

/usr/local/bin/ttserver -port 1978 -thnum 8 -dmn -pid /data/urlcache/ttserver.pid -log /data/urlcache/ttserver.log -le -ulog /data/urlcache/ -ulim 128m -sid 33 -mhost 10.10.10.33 -mport 1978 -rts /data/urlcache/ttserver.rts /data/urlcache/database.tch#bnum=500000

为什么放弃Nginx的文件缓存？

Nginx从0.7.48版本开始，支持了类似Squid的缓存功能。这个缓存是把URL及相关组合当作Key，用md5编码哈希后保存在硬盘上，所以它可以支持任意URL链接，同时也支持404/301/302这样的非200状态码。虽然目前官方的Nginx Web缓存服务只能为指定URL或状态码设置过期时间，不支持类似Squid的PURGE指令，手动清除指定缓存页面，但是，通过一个第三方的Nginx模块，可以清除指定URL的缓存。

html文件数量太多，累计5G多，如果依靠nginx自身的文件缓存，每台nginx主机都会cache，有些浪费而且基于文件的IO存在瓶颈，不能很好发挥内存的作用。

附备忘

　　我常用的Nginx 0.8.55版本，proxy_cache和fastcgi_cache已经比较完善，加上第三方的ngx_cache_purge_v1.6模块（用于清除指定URL的缓存），已经可以完全取代Squid。

　　在功能上，Nginx已经具备Squid所拥有的Web缓存加速功能、清除指定URL缓存的功能。而在性能上，Nginx对多核CPU的利用，胜过Squid不少。另外，在反向代理、负载均衡、健康检查、后端服务器故障转移、Rewrite重写、易用性上，Nginx也比Squid强大得多。这使得一台Nginx可以同时作为“负载均衡服务器”与“Web缓存服务器”来使用。
　　
　　1、Nginx 负载均衡与缓存服务器在 Linux 下的编译安装：

ulimit -SHn 65535
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.00.tar.gz
tar zxvf pcre-8.00.tar.gz
cd pcre-8.00/
./configure
make && make install
cd ../

wget http://labs.frickle.com/files/ngx_cache_purge-1.0.tar.gz
tar zxvf ngx_cache_purge-1.0.tar.gz

wget http://nginx.org/download/nginx-0.8.32.tar.gz
tar zxvf nginx-0.8.32.tar.gz
cd nginx-0.8.32/
./configure --user=www --group=www --add-module=../ngx_cache_purge-1.0 --prefix=/usr/local/webserver/nginx --with-http_stub_status_module --with-http_ssl_module
make && make install
cd ../

　　2、/usr/local/webserver/nginx/conf/nginx.conf 配置文件内容如下：

user www www;

worker_processes 8;

error_log /usr/local/webserver/nginx/logs/nginx_error.log crit;

pid /usr/local/webserver/nginx/nginx.pid;

Specifies the value for maximum file descriptors that can be opened by this process.
worker_rlimit_nofile 65535;

events
{
use epoll;
worker_connections 65535;
}

http
{
include mime.types;
default_type application/octet-stream;

charset utf-8;

server_names_hash_bucket_size 128;
client_header_buffer_size 32k;
large_client_header_buffers 4 32k;
client_max_body_size 300m;

sendfile on;
tcp_nopush on;

keepalive_timeout 60;

tcp_nodelay on;

client_body_buffer_size 512k;
proxy_connect_timeout 5;
proxy_read_timeout 60;
proxy_send_timeout 5;
proxy_buffer_size 16k;
proxy_buffers 4 64k;
proxy_busy_buffers_size 128k;
proxy_temp_file_write_size 128k;

gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_http_version 1.1;
gzip_comp_level 2;
gzip_types text/plain application/x-javascript text/css application/xml;
gzip_vary on;

#注：proxy_temp_path和proxy_cache_path指定的路径必须在同一分区
proxy_temp_path /data0/proxy_temp_dir;
#设置Web缓存区名称为cache_one，内存缓存空间大小为200MB，1天没有被访问的内容自动清除，硬盘缓存空间大小为30GB。
proxy_cache_path /data0/proxy_cache_dir levels=1:2 keys_zone=cache_one:200m inactive=1d max_size=30g;

upstream backend_server {
server 192.168.8.43:80 weight=1 max_fails=2 fail_timeout=30s;
server 192.168.8.44:80 weight=1 max_fails=2 fail_timeout=30s;
server 192.168.8.45:80 weight=1 max_fails=2 fail_timeout=30s;
}

server
{
listen 80;
server_name www.yourdomain.com 192.168.8.42;
index index.html index.htm;
root /data0/htdocs/www;

location /
{
     #如果后端的服务器返回502、504、执行超时等错误，自动将请求转发到upstream负载均衡池中的另一台服务器，实现故障转移。
     proxy_next_upstream http_502 http_504 error timeout invalid_header;
     proxy_cache cache_one;
     #对不同的HTTP状态码设置不同的缓存时间
     proxy_cache_valid  200 304 12h;
     #以域名、URI、参数组合成Web缓存的Key值，Nginx根据Key值哈希，存储缓存内容到二级缓存目录内
     proxy_cache_key $host$uri$is_args$args;
     proxy_set_header Host  $host;
     proxy_set_header X-Forwarded-For  $remote_addr;
     proxy_pass http://backend_server;
     expires      1d;
}

#用于清除缓存，假设一个URL为http://192.168.8.42/test.txt，通过访问http://192.168.8.42/purge/test.txt就可以清除该URL的缓存。
location ~ /purge(/.*)
{
 #设置只允许指定的IP或IP段才可以清除URL缓存。
 allow            127.0.0.1;
 allow            192.168.0.0/16;
 deny            all;
 proxy_cache_purge    cache_one   $host$1$is_args$args;
}    

#扩展名以.php、.jsp、.cgi结尾的动态应用程序不缓存。
location ~ .*\.(php|jsp|cgi)?$
{
     proxy_set_header Host  $host;
     proxy_set_header X-Forwarded-For  $remote_addr;
     proxy_pass http://backend_server;
}

access_log  off;
}
}

　　3、启动 Nginx：

/usr/local/webserver/nginx/sbin/nginx

　　4、清除指定的URL缓存示例：

补充写了工作备忘

  #proxy_next_upstream http_502 http_504 error timeout invalid_header;
  #proxy_cache pscms;
  #proxy_cache_valid 200 304 1h;
  #proxy_cache_valid 301 302 5m;
  #proxy_cache_valid any 1m;


 #如果后端的服务器返回502、504、执行超时等错误，自动将请求转发到upstream负载均衡池中的另一台服务器，实现故障转移。
 proxy_next_upstream http_502 http_504 error timeout invalid_header;
 proxy_cache cache_one;
 #对不同的HTTP状态码设置不同的缓存时间
 proxy_cache_valid  200 304 12h;
 #以域名、URI、参数组合成Web缓存的Key值，Nginx根据Key值哈希，存储缓存内容到二级缓存目录内
 proxy_cache_key $host$uri$is_args$args;
 proxy_set_header Host  $host;
 proxy_set_header X-Forwarded-For  $remote_addr;
 proxy_pass http://backend_server;
 expires      1d;


        location ~* \.(gif|jpg|jpeg|png|bmp|swf|js|css)(\?.*)$ {
            set $memc_cmd 'get';
            set $memc_key $uri;
            memc_pass jscache;
            error_page 404 = @jsfile404;
            }

location @jsfile{
    set $memc_cmd 'set';
    set $memc_key $uri;
     set $memc_exptime 24;
    proxy_pass http://www.wpadm.com:8080;
    memc_pass jscache;
}

分享到：

windows下开发hadoop的MR程序--AccessCont ... | 节约内存：Instagram的Redis实践

2013-03-14 10:54
浏览 1859
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论