Google:
Compute the h-index of a list of papers, given their citation count. Can you do it in linear time? How about a distributed algorithm for the task?
Facebook:
Given: for every paper authored, there is a citation count vector. The h-index is a measure of researcher importance. h-index: The largest number i such that there are i papers each with at least i citations.
1. Suppose that the citation-vector is sorted, how to efficiently compute the h-index?
2. Suppose that the citation-vector is not sorted, how to efficiently compute the h-index? time complexity? an algorithm with time complexity n?
Princeton algorithm:
Given an array of N positive integers, its h-index is the largest integer h such that there are at least h entries in the array greater than or equal to h. Design an algorithm to compute the h-index of an array.
Hint: median or quicksort-like partitioning and divide-and-conquer.
Solution:
- Create an int[] Histogram as big as the maximum number of publications of any particular scientist).
- If all publication reference counts are stored in another int[] references, then go over this array and, on each publication, if it's reference count is R, then do Histogram[R]++. While doing this, keep the maximum reference count in Max.
- After building the histogram, do a decreasing loop on int[] Histogram from i=Max, adding Histogram[i] values to int hIndex. When hIndex >= i, return i as the hIndex.
... As to the distributed part, let several machines build the Histogram of disjoint sets of somebody's publications, and then have one machine add up those histograms and return hIndex as described above.
1. binary-search (O(log(n)). If citations[i] >= i then h >= i (if array's in descending order).
2. Here's a O(n) time & space solution in ruby. The trick is you can ignore citation-counts larger than n.
If there are 'n' papers in total, this problem can be solved in O(n) with space complexity of O(n). Note that, h-index can be between 0 to n. Say if the h-index is 10, this means, there has to be 10 papers with citation count >= 10. So if we can find out the number of papers with citations >=X for every X (and store it in an array C) where X ranges between 0 to n, then by scanning the count array C from the right to left, we can find the h-index at index i where i == C[i].
Pseudocode:
input array A of length n.
- init array C[0] to C[n] with 0
- foreach p in A, if p >= n, c[n]++; else c[p] +=1
- for i=n-1 to 0, c[i]=c[i]+c[i+1]
- for i=n to 0, if c[i] == i return i
// assume sorted in descending order, O(lgN) public static int getHIndexFromSorted(int[] citation) { int low = 0; int high = citation.length - 1; while(low <= high) { int idx = (low+high)/2; if(citation[idx] >= idx + 1) { low = idx + 1; } else { high = idx - 1; } } return low; } // sort the array, O(NlgN) public static int computeHIndexBySorting(int[] A) { Arrays.sort(A); int h = 0; for (int i = A.length-1; i >= 0; i--) { if(A[i] > h) { h++; } else { return h; } } return -1; } // no need to sort array, O(N) public static int computeHIndex(int[] A) { int n = A.length; int[] s = new int[n+1]; for(int num : A) { num = Math.min(n ,num); s[num]++; } int sum = 0; for (int i = s.length-1; i >= 0; i--) { sum += s[i]; if(sum >= i) { return i; } } return -1; }
Reference:
http://en.wikipedia.org/wiki/H-index
http://www.careercup.com/question?id=14585874
http://algs4.cs.princeton.edu/25applications/
相关推荐
collections-of-the-basis-of-compute-system-2nd.bin
这里的"google-api-services-compute-v1-rev100-1.21.0.zip"文件,包含了Compute Engine API的特定版本,使得开发者能够轻松地在自己的应用中集成Google Compute Engine的功能。 Google API Services的Compute ...
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
gradle-git-scm-plugin.zip,gradle-scm插件的git实现gradle-scm插件的it实现
nsight family系列产品之一,结合Nsight System等工具用于GPU并行代码的性能分析,为程序优化提供指导,由于在Nvidia官网下载缓慢故将现在下载好的安装包上传(2022年3月12日)
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
《PyPI上的mypy-boto3-compute-optimizer-1.18.64:一个提升AWS Compute Optimizer效率的Python库》 在Python的世界中,PyPI(Python Package Index)是众多开发者获取和分享软件包的主要平台。今天我们要探讨的是...
《Python库azure-mgmt-compute-21.0.0:探索Azure计算管理服务的深度集成》 在当今数字化时代,云计算已经成为企业级应用的重要基础设施。Microsoft Azure作为全球领先的云平台,提供了丰富的服务来满足各种业务...
在OpenStack环境中,nova-compute是核心组件之一,它主要负责管理计算节点上的实例(instance)生命周期。这个组件与不同的虚拟化技术(Hypervisor)协作,实现OpenStack对实例的各种操作,如启动、停止、重启、迁移...
怎么建? sudo docker build --rm -t="krystism/openstack-nova-compute" . 如何使用 ? 在启动 nova-compute 实例之前,您需要运行以下... docker run -d -e RABBITMQ_NODENAME=rabbitmq -h rabbitmq --name rabbit
《PyPI上的mypy-boto3-compute-optimizer-1.10.47.0:优化Python AWS Compute资源的工具》 PyPI(Python Package Index)是Python开发者的重要资源库,它为全球的Python开发者提供了无数的第三方库,使得开发工作...
《Python库mypy-boto3-compute-optimizer详解与应用》 在当今的软件开发领域,Python因其简洁的语法和强大的库支持而备受青睐。在众多的Python库中,mypy-boto3-compute-optimizer是针对AWS Compute Optimizer服务...
官方离线安装包,亲测可用
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,亲测可用
这个资源是 Python 开发者在 PyPI(Python Package Index)官网上可以找到的一个软件包,名为 `mypy-boto3-compute-optimizer-1.12.14.0.tar.gz`。它是一个压缩文件,包含了一个特定版本(1.12.14.0)的 Python 库,...
### nova-compute源码分析 #### 一、Nova概述及工作职责 **1.1 Nova的角色与任务** Nova是OpenStack项目中一个至关重要的组成部分,它主要负责虚拟机实例的生命周期管理,包括创建、调度、运行和销毁等功能。具体...
《OpenStack云计算平台管理(nova-compute/network)详解》 OpenStack是一个开源的云计算平台,其设计目标是提供一个能够构建、部署和管理私有云和公有云服务的框架。在OpenStack中,nova-compute和network是两个至关...