如何高效实现批量插入数据，并去重

0 0

如何高效实现批量插入数据，并去重0

现需要实现定期将项目A的数据同步到项目B中，从项目A提供了基于HTTP的接口获取数据，解析数据，并将解析到的数据批量插入到项目B的数据库中，但是有一个要求插入之前必须要判断当前数据是否存在，如果存在则不插入，有什么高效的方式实现。

2014年6月14日 17:15

gq2010
3
0 0 13

最近刚做了一个你说的类似需求：

我的业务需求是，
从oracle数据库中获取数据，然后同步到sqlserver中。

首先是配置两个数据库之间的连接设置。
我是sqlserver 连接oracle 配置sqlserver的链路服务器就OK。

下面是存储过程的内容了：

1. 创建临时表。

通过远程连接，insert into 临时表 select 远程表。
获取数据先到本地，。

然后用临时表的数据，跟你本地业务表的数据进行对比。
查询不通的数据。

-- (1) 远程读取NC需求计划，分组汇总数据后，插入到临时表 #tmp_pl_plan中。
	set @InsertStrSQL = @InsertStrSQL+ @tmpStrSQl;
	print(@InsertStrSQL) ;
	exec(@InsertStrSQL);
	
	select @tmpCont = count(1) from #tmp_pl_plan ;
	
	-- state:0新增、1修改、2删除
	-- (2) 用本地数据与临时表中的数据，进行对比，更新本地表中计划数量与临时表中不相等的记录.
	   
		update t set t.plnum  =	a.plnum	,t.state = 1
		from  #tmp_pl_plan a,NC_PL_PLAN t 
		where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate
		and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline 
		and a.zzweldingwayCode  = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode
		and a.convertedcode = t.convertedcode and  a.ncfprocode = t.ncfprocode	
		and t.plnum != a.plnum	
		and t.weldingdate >=  @fbegdate and t.weldingdate <= @fenddate 
	
	-- (3) 对比数据，查找本地表中存在，但是临时表中不存在的记录，然后修改本地表中的数量=0 ,state = 3 表示删除
		update t set t.plnum = 0 ,t.state = 2  
		from NC_PL_PLAN t
		where t.weldingdate between  @fbegdate and @fenddate 
		and not exists (
			select 1 from  #tmp_pl_plan a where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate
			and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline 
			and a.zzweldingwayCode  = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode
			and a.convertedcode = t.convertedcode  and a.ncfprocode = t.ncfprocode
			
		);

	
	-- (4) 对比数据，新增临时表中不存在于当前表的数据
	--delete	NC_PL_PLAN;
	insert into NC_PL_PLAN 
	select * from #tmp_pl_plan t
	where t.weldingdate between  @fbegdate and @fenddate 
	and not exists (
		select 1 from   NC_PL_PLAN a where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate
		and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline 
		and a.zzweldingwayCode  = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode
		and a.convertedcode = t.convertedcode and a.ncfprocode = t.ncfprocode
		and a.weldingdate >=  @fbegdate and a.weldingdate<= @fenddate 
	)
	order by t.weldingdate desc ;

2014年9月18日 10:09

21841625a
2
0 0 5

添加评论

0 0

分2步骤做：
1.首先用关键字区建立2类记录，即已经存在的记录和不存在的记录；
2.对已经存在的记录进行一次批量的Update操作，不做数据的比对工作；
3.对不存在的记录进行一次批量的insert操作；

这个样的效率要高很多。至于你用什么做批量操作就要看你用的数据库了。

2014年7月25日 15:42

mymailzxj
153
0 0 3

添加评论

0 0

你可以试试开源的Kettle ETL抽取工具，很好用。设置数据源及源库和目标库
新建转换--加入输入--更新\插入操作。效率比较高

2014年7月21日 13:14

zidanexun
30
0 0 0

添加评论

0 0

个人感觉，通过新建临时表，现将数据插入到临时表中，在通过sql中去重语句去掉重复的记录，然后在将临时表中的数据通过方式同步到另外的数据库中，这样相对安全和效率来说，还算可以。

2014年7月21日 10:57

liujun58love
30
0 0 0

添加评论

0 0

从题主的描述来看：
1：定期同步：说明数据实时性要求不高，只是周期性同步。
2：HTTP的接口获取数据：用HTTP说明数据量不大。
不了解题主所说的高效指的什么：
两种可能：
高效开发，维护：即最小开发量，日后维护量也小。
高效同步数据：即数据快速同步。
可选择的方案是：文件全量同步，ftp+shell脚本+sqlloader全是配置工作，开发量最小，好维护。同时sqlloader装载速度最快，全量同步，不用考虑数据一致性问题。

如果无法做全量同步，只能增量的话，就要考虑数据重复，及一致性问题。
比较好的方案是：先装载到临时表，然后库内做merge,update.

2014年7月17日 17:23

james_lover
40
0 1 3