Spend a week for data migration. Here are some things I learned:
1.
class OldScenario < ActiveRecord::Base
establish_connection configurations['old_db']
set_table_name(connection.current_database + "." + "scenarios")
has_many :old_page_groups, :order=>:created_at, :foreign_key => "scenario_id"
def old_page_group
self.old_page_groups.find_all_by_version("draft", :order => "created_at DESC")[0]
end
end
debugger
Above is code for create model "OldScenario" and link to old_db:scenarios. I made a lot of model like this.
But how to make sure all of the establishes are successful? I like to put a debugger here. It is very convenient to test.
2.
PageGroup.connection.execute("update page_groups set type='Preparation'")
is code to execute sql instead of rails. It is faster and works well here, but never use this in other conditions like in model.
here is a example:
self.page_parts.each do part|
#do this do not use active_record because of observer
#PagePart.connection.execute("UPDATE page_parts SET page_id=null WHERE id=#{part.id}")
#part.update_attributes(:page_id=>nil)
PagePart.update_all("page_id = NULL", {:id=>part.id}) #code from JP, update_all avoid the call back
end
it will cause wired bug because cache of rails.
use
PagePart.update_all("page_id = NULL", {:id=>part.id})
3.
If database is very big(like have 400000 more records), do not load all data together. It will be very slow and may crash the server. Below is code from seven, it solve this kind of problem well.
0.step(length, 1000) do |t|
Page.find(:all, :conditions => "id>#{t} and id<=#{t}+1000").each do |page|
page.destroy if page.page_groups.blank?
end
end
A better way is use "find_in_batches" (from andy)
分享到:
相关推荐
老司机Xavier Amatriain的分享引起了我的兴趣:Lessons Learned from Building Real-Life Recommender Systems。主要分享了作为推荐系统老司机的他,多年开车后总结的禁忌和最佳实践,这样的采坑实录显然是很有...
藏经阁-Lessons Learned From Dockerizing Spark Workloads.pdf 本文档主要讨论了使用 Docker 容器化 Spark 工作负载的经验教训。下面是从文档中提取的知识点: Docker 容器和大数据 * 在大数据领域,Docker 容器...
In it you will see simple guidelines based on lessons learned from real-life data discovery and unification, as well as useful visualization techniques. These in turn help you improve the quality of ...
藏经阁-Lessons Learned From Managing Thousands of Apache Spark Clusters 今天,我们将讨论 Databricks 的经验教训,即管理数千个 Apache Spark 集群的经验教训。 Apache Spark 是一个开源的数据处理引擎,可以...
By the end of the book, you'll have built a full-featured application, gained a complete understanding of Core Data, and learned how to integrate your application into the iPhone/iPad platform. ...
"阿里云大数据管理经验总结" 在本篇文章中,我们将讨论大数据管理的经验总结,特别是基于 Apache Spark 的大数据管理经验。作者 Josh Rosen 和 Henry Davidge 分享了在 Databricks 中管理数千个 Apache Spark 集群...
### 提升分布相似性:来自词嵌入的经验教训 #### 摘要 本文献是自然语言处理(NLP)领域的一项重要研究,探讨了如何通过借鉴词嵌入技术改进词的分布相似性。该研究指出,词嵌入模型在词相似性和类比检测任务上相较...
优质资源,值得拥有
"The chapters in this volume offer useful case studies, technical roadmaps, lessons learned, and a few prescriptions to ‘do this, avoid that.’" ―From the Foreword by Joe LaCugna, Ph.D., Enterprise ...
Now it’s possible to develop for the Raspberry Pi using native Windows and all the related programming skills that Windows programmers have learned from developing desktop and mobile applications....
Plan and execute your data migration to Salesforce Design low-maintenance, high-performing data integrations with Salesforce Understand common data integration patterns and the pros and cons of each ...
Apache Spark 是一个流行的分布式计算框架,以其高效的数据处理和实时分析能力而闻名。在大规模数据处理环境中,故障容忍是至关重要的特性,因为它确保了系统的稳定性和数据的完整性。本篇文章将深入探讨Spark中的...
在项目管理领域,"Lesson Learned Document"(经验教训文档)是至关重要的工具,它记录了项目执行过程中的成功与失败,为未来的项目提供宝贵的参考。这份文档通常包括项目的各个阶段,从规划、执行到收尾,旨在提升...
这篇文章介绍了一种用于高可靠性计算机系统动态维护策略优化的方法,其特点是考虑了从实际运行数据中学习到的运行特性。与以往通常关注普通系统常见问题的研究不同,本文研究了基于20多个大型计算机系统超过9年的...