`

Spreadsheet Dependency graph

阅读更多

In spreadsheets, such as Excel, we can write simple formules like the following:

 

 

Here C4 = A3 + A5, so if A3=5 and A5=7, then C4=12. If we change A3 to 6 and hit enter, then C4 is automatically changed to 13. So in the background, there is a dependency graph among the cells. We can explicitly show the graph by the menu Tools | Formula Auditing | Trace Precedence.

 

To go a little bit further, now if we have another cell E6, depends on C4 and C8, where C4 depends on A3 and A5. If we change the value C8, then E6 is recalculated, but not C4. In other words, the automatic recalculation happens only when necessary (The intermediate values are saved somewhere).

 

 

Now I want to replicate this logic in Java (or C#, python, etc). I've been thinking about this in the last couple of years, still hunting for a good solution. The motivation is that I need to run hundreds of millions of complicated calculations every day on thousands of machines. The duplicated calculations is about 20% - 30%. 

 

It's not difficult to replicate spreadsheet logic because the formula in spreadsheet is fairly limited. However, to do the same thing for a general programming language like Java/C#/python is more complex because of rich grammers.

 

One way to do this is the following:

1. have setter/getter for each relevent field, then build dependency graph in getters method (make sure getter is called everywhere). Build cached values and dirty flags in setters.

2. cache returned values for relevent methods, and build dependent graph between methods and variables. Also we need to take care of dirty flags.

 

A similar approach is in the project: http://publicobject.com/glazedlists/.

 

However, I am not satisfied with this approach, the overhead of the coding is way too much. I want a simpler solution. Now I found a better, yet not complete, solution - using annotation and aspectj. Since the coding is pretty simple, I am going to skip it. Rather, I'll show the behavior and the design logic.

 

Now let's try to mimic the spread sheet logic. Suppose we have a simple class:

 

 

public class A
{
    @Depend int i=5;
    @Depend int j=7;

    @Depend public int calc1()
    {
        System.out.println("run calc1() ...");
        return i + j;
    }
}

 

Here we want to monitor the fields i and j, and the method calc1(). My testcase is somewhat like this:

 

 

public void testCalc1()
{
    A a = new A();
    a.calc1();
    a.calc1();
}

 

There should be only 1 printout from the method (meaning we calculate it only once).

 

I like this design because it puts minimal effort on developers, with only one extra annotation. This is the most attractive feature that I can dream of. I can't think of any simpler solution (If you can, let me know).

 

I tried to use annotation only without aspectj, but it didn't work out. The main reason is that we need object identifiers, not just class identifiers. For example, If we have a, b of the same class A, then a.i and b.i are different objects. So when we build dependency graphs, we have to make separate nodes.

I use Object's toString() method as its identifier (so don't overwrite it, :-)). This works only in one JVM.

 

Using aspectj's field get aspect, we could intercept all references to annotated variables and thus we could build the dependency graph between relevent variables and methods. 

 

Using aspectj's field set aspect, we could intercept all variable assignments (there is one exception), then we need to set dirty flags for all parents in the dependency tree. There is one exception in the aspectj field set aspect - it can't intercept array element assignment, such as x[3] = 10. For more information on this, check aspectj documents.

 

Once we have the dependency graph, we could add another aspect to intercept annotated method calls. If the cached value is null or the dirty flag is true, then we run the method and save the result.

 

While these work well for new classes, it would be better if we can handle Collection classes as well. So we just add another aspect to intercept all methods in Collection/Map classes which change the content, such as add/put/set, etc (only when the field is annotated).

 

There are several implicit design decisions:

1. Since we are using aspectj field get interception, this means we intercept every reference of annotated variables. Though aspectj is pretty fast, the overhead is still not acceptable sometimes. If this is the case, e.g., we reference variables many many time, we could introduce a new local variable. As long as the new variable is not annotated, the aspectj interception won't be triggered.

2. I mentioned above that we can't intercept array element assignment, this is a limitation of aspectj. But we implement the aspectj for Collections/Maps, so in case of arrays, use Collections/Maps.

3. Since we are using aspectj interception, we ignore the control graph, i.e., if-else, loops. This tradeoff can be worked around. If the control graph is complex, then we should break the method into several sub-methods to reduce the complexity; otherwise, the harm is minimal so that we could ignore it.

4. Notice that there is no parameter in the annotated method in the above example. All dependencies are class level fields. Technically, we could take parameters into account. But I feel this does more harm than beneficial. Of course, my current approach would make the class stateful (not thread safe).

 

If we google the phrase "program dependency graph", we could find tons of references using different approaches. One is to build the dependency graphs from source code. Though JDK6 has compiler APIs, it's still a tough job. A lot of research papers use abc compiler: http://abc.comlab.ox.ac.uk/introduction.

 

 

Another interesting (and useful) feature we can extend from spreadsheet is the following. In my environment, I need to do a lot of computing in a certain pattern: given a set of input, compute a value; then tweak a field in the input, compute it again, then tweak it back and tweak another field, etc. Sometimes, we forgot to tweak back the field value before going on the next evaluation and disasters happen. For example,

class A
{
		@Depend int i = 5;
		@Depend int j = 11;
				
		@Depend public int prod()
		{
			return i * 18 + j;
		}
		
		@Tweak @Depend public int mytweak()
		{
			i = 6;
			j = 29;
			return prod();
		}
}

 

In the method prod() we want i and j to be the values 5 and 11 (initially assigned). Then in method mytweak() we changed the values i and j. We want this change to be local to mytweak(), not spill out to prod().

 

To be precise, here is a testcase:

public void testPrimitiveTweak()
{
		A a = new A();
		int r = a.mytweak();
		assertTrue(r == 137);
		
		r = a.prod();
		assertTrue(r == 101);		
}
 So whatever change we make in mytweak() is not showing up in prod(), i.e., the changes are "erased" and the original values are restored.

 

  • 大小: 19.4 KB
  • 大小: 6.6 KB
分享到:
评论

相关推荐

    phpSpreadsheet.zip

    $spreadsheet = IOFactory::load('path_to_your_file.xlsx'); $worksheet = $spreadsheet->getActiveSheet(); ``` - 然后,你可以遍历工作表中的行和列,访问单元格的值: ```php foreach ($worksheet->...

    SpreadSheet控件主要属性、方法和事件

    ### SpreadSheet 控件主要属性、方法和事件 #### 一、概述 SpreadSheet 控件作为 Office Web 组件的一部分,广泛应用于各种需要展示和编辑表格数据的网页应用中。该控件的功能强大,提供了丰富的属性、方法及事件...

    OWC中SpreadSheet控件的操作方法集合

    ### OWC中SpreadSheet控件的操作方法集合 在OWC(Open Web Clent)系统中,SpreadSheet控件被广泛应用于实现类似Excel的功能,为用户提供一个直观的数据处理平台。本文档将详细介绍如何通过代码操作SpreadSheet控件...

    VBA_ 使用spreadsheet控件.rar

    在Microsoft Visual Basic for Applications(VBA)中,Spreadsheet控件是一种非常有用的工具,它允许开发者在用户界面中嵌入电子表格功能。本资源“VBA_ 使用spreadsheet控件.rar”显然提供了一些关于如何在VBA项目...

    C#中spreadsheet的使用

    ### C#中Spreadsheet的使用详解 在C#开发中,处理Excel文件是非常常见的需求之一。本文档将详细介绍如何在C#中使用Spreadsheet技术来读取、操作和展示Excel文件的内容。 #### 一、环境准备与前置知识 在开始之前...

    SpreadSheet

    下面将详细解释`SpreadSheet.cpp`和`SpreadSheet.h`这两个关键文件以及如何在VC++中操作Excel的知识点。 `SpreadSheet.cpp`和`SpreadSheet.h`是C++源代码文件,它们一起定义了一个类或一组功能,以便于程序员可以...

    SpreadSheet简单使用实例

    1、SpreadSheet是一个Excel操作封装类,使用起来比其他的更为方便。 2、修正了原版SpreadSheet几个错误问题 3、压缩包里面包含了SpreadSheet的简单使用示例。 4、使用vs2008编译通过

    Perl SpreadSheet_Excel

    Perl SpreadSheet_Excel 是一个基于Perl编程语言的库,用于解析和操作Microsoft Excel电子表格文件。这个库的核心组件是 `Spreadsheet::ParseExcel` 模块,它允许开发者读取Excel文件的内容,包括单元格的数据、公式...

    DevExpress SpreadSheet 报表模板演示源码

    DevExpress SpreadSheet是一款高级的电子表格组件,用于在应用程序中实现类似Excel的功能。这款工具提供了丰富的API和功能,允许开发者创建、编辑和展示复杂的电子表格,包括报表模板的设定。本项目是关于...

    owc11 Spreadsheet 详细使用

    **OWC11 Spreadsheet 全面解析** OWC11 Spreadsheet 是一款强大的电子表格应用程序,由微软在Office Web Components 2010中提供,它允许用户创建、编辑和共享电子表格,类似于Microsoft Excel。本篇文章将深入探讨...

    从Excel模板导出excel文件的PHPSpreadsheet扩展

    $spreadsheet = new Spreadsheet(); // 设置活动sheet索引为0,创建新工作表 $worksheet = $spreadsheet->getActiveSheet(); $worksheet->setCellValue('A1', 'Hello World!'); $worksheet->setCellValue('B2', '这...

    ThinkPHP5.1 框架下 PhpSpreadsheet 操作 Excel 表的导入导出.rar

    本资源主要探讨了如何在ThinkPHP5.1框架下利用PhpSpreadsheet库来实现Excel文件的操作,包括导入和导出功能。下面将详细阐述这两个关键知识点。 **ThinkPHP5.1框架** ThinkPHP5.1是基于PHP语言的一个轻量级的MVC...

    perl Spreadsheet

    Perl Spreadsheet 是一个Perl编程语言中的模块,用于处理电子表格文件,特别是Microsoft Excel的.xls和.xlsx格式。这个模块允许程序员在Perl脚本中创建、读取和修改Excel工作簿,为数据处理和自动化提供了极大的便利...

    Spreadsheet-ParseExcel-0.65.tar.gz

    Spreadsheet-ParseExcel,Perl的Excel插件,可用于读写Excel文件,在Linux下对Excel文件进行处理。 可以用Spreadsheet::ParseExcel先解析excel,再用Spreadsheet::WriteExcel写入。 或者直接使用Spreadsheet::...

    PhpSpreadsheet-master.zip

    《使用PhpSpreadsheet进行高效Excel操作》 在现代的Web开发中,数据处理与分析扮演着至关重要的角色,尤其在需要处理大量结构化数据时,Excel文件常常是首选格式。PhpSpreadsheet是一款强大的PHP库,专门用于读取、...

    关于Spreadsheet 对象的方法、事件、属性

    在IT领域,尤其是在编程和数据分析中,Spreadsheet对象是一个至关重要的概念。Spreadsheet对象通常指的是电子表格应用程序中的工作簿或工作表,如Microsoft Excel或Google Sheets。这些对象提供了丰富的功能,包括...

    基于DevExpress的SpreadsheetControl实现对Excel的打开、预览、保存、另存为、打印示例代码下载.zip

    在本示例中,我们关注的是其SpreadsheetControl,这是一个用于创建电子表格应用的强大工具,能够处理Excel文件并模拟Excel的功能。以下是基于DevExpress的SpreadsheetControl实现对Excel操作的关键知识点: 1. **...

    spreadsheet

    标题中的"spreadsheet"一词通常指的是电子表格,这是一种用于组织和分析数据的软件工具,类似于我们熟知的Microsoft Excel或Google Sheets。在Qt框架中,也有实现类似功能的组件,称为QSpreadsheet或QTableWidget,...

    Spreadsheet_Excel_Writer 生成excel 实例

    "Spreadsheet_Excel_Writer" 是一个库,它允许程序员通过编程方式创建和修改Excel文件,而无需依赖Microsoft Excel软件本身。本实例将深入探讨如何使用Spreadsheet_Excel_Writer库生成Excel文件,帮助开发者更高效地...

    x-spreadsheet.zip

    《前端Web Excel应用:探索x-spreadsheet》 在当今数字化时代,数据处理与分析已成为企业和个人日常工作中不可或缺的一部分。Excel作为传统桌面端的数据管理工具,以其强大的功能和易用性赢得了广泛赞誉。然而,...

Global site tag (gtag.js) - Google Analytics