在上一个版本 统计push点击次数的shell脚本中,有一个问题,就是没有进行去重,比如同一条push被用户点击了多次,点击次数都在加加,显然数量不准确,运营人员需要的是有多少个设备点击打开了push的次数。
于是版本升级了下:
关键代码如下:
awk -F "," '!a[$2,$3,$4,$5]++' $masterPath/push_feed_back_log.log.$date |awk -F "," '{a[$2" "$5]+=1}END{for(i in a) printf "%s %s\n",i,a[i]}' | sort -k 3 -n -r | head -n 15 >>/tmp/mymaster.txt
说明:最关键的去重部分 '!a[$2,$3,$4,$5]++' 表示第2,3,4,5列不能重复,如果重复,就只算一条记录第一列是时间所有忽略。
完整代码如下:
#!/bin/sh
date=`date -d -1day +%Y-%m-%d`;
masterPath=/home/www/logs/taofen8/master;
apiPath=/home/www/logs/taofen8/api;
path=/home/www/logs/taofen8;
hisPath=/data/history/taofen8;
log=$path/master/push_feed_back_log.log.$date
echo "==============start==================";
if [ ! -f "$log" ]; then
#判断文件是否存在
masterPath=$hisPath/master;
apiPath=$hisPath/api;
fi
echo "==============date$date===================" > /tmp/mymaster.txt;
#!a[$2,$3,$4,$5]++ 去重
awk -F "," '!a[$2,$3,$4,$5]++' $masterPath/push_feed_back_log.log.$date |awk -F "," '{a[$2" "$5]+=1}END{for(i in a) printf "%s %s\n",i,a[i]}' | sort -k 3 -n -r | head -n 15 >>/tmp/mymaster.txt
echo "===============API==================" >> /tmp/mymaster.txt;
awk -F "," '!a[$2,$3,$4,$5]++' $apiPath/push_feed_back_log.log.$date |awk -F "," '{a[$2" "$5]+=1}END{for(i in a) printf "%s %s\n",i,a[i]}' | sort -k 3 -n -r | head -n 15 >>/tmp/mymaster.txt
echo "=================end================" >> /tmp/mymaster.txt;
这样就是统计不重复的点击记录。