This is an interview question I got. Given an iterator and a sample size, how can you generate a random List. What I was asked to do is to fill this API
public static List<Node> randonSampleIter(Iterator<Node> items, int sampleSize)
Luckily, I got a similar question, which is kind of the special case for this one on a phone screen. That question is like, given a byte stream which you cannot randomly access, how can you return a random byte from it. Here is the idea, if the byte stream only has 1 byte, you return it. If it has 2 bytes, each of them has 50% to be returned, if it has 3 bytes, each of them share 33%. So we can generalize that for the kth byte, the chances that we return it is 1/k, so the problem get solved.
Now back to this question, you may noticed that we can just replace 1 with sample size (m). We get the first m elements for our temp result, and then for every element we iterate through, it has m/k chances to be included in our result. this can be easily done by generate a random number from 0-k and then to see if the number falls within the range 0-m.
public static List<Node> randonSampleIter(Iterator<Node> items, int sampleSize) {
int count = 0;
List<Node> result = new ArrayList<Node>();
while (count < sampleSize && items.hasNext()) {
result.add(items.next());
count++;
}
if (count < sampleSize) {
//the source given is less than sampleSize, we just return how many we get
return result;
}
Random rand = new Random();
while (items.hasNext()) {
Node element = items.next();
int r = rand.nextInt(count);
if (r < sampleSize) {
result.set(r, element);
}
count++;
}
return result;
}
分享到:
相关推荐
总结起来,"Random sampling of bandlimited signals on graphs" 这一主题涵盖了信号处理和图论的交叉领域,它研究如何在带限图信号中采用随机采样策略,以实现信号的有效重构。通过优化采样策略,我们可以降低采样...
- **随机采样(Random sampling)**:与传统的确定性采样不同,随机采样并不按照固定模式或周期来选择样本点,而是以某种概率分布随机选择样本点。在图上进行随机采样,可以看作是从图节点集合中随机选取节点。 - *...
本文探讨了在数据集大小未知的情况下进行随机抽样的问题,并提出了一种高效的方法——水库抽样算法(Reservoir Sampling Algorithm)。该算法能够在一次遍历过程中,利用常数空间复杂度选取一个不放回的随机样本,...
在论文《Stabilization for Networked Control Systems with Random Sampling Periods》中,作者元李、张庆玲等人将传感器到控制器(Sensor-to-Controller, S-C)的随机延迟和随机采样周期建模为马尔科夫链(Markov ...
ISO 24153:2009标准是国际标准化组织发布的一项关于随机抽样和随机化程序的规范,其主要目标是提供一套统一的方法,以确保在各种领域,尤其是质量控制、统计分析和试验设计中进行公正、无偏的随机样本选择和随机化...
在给定的"Algorithm-random-sampling.zip"文件中,我们聚焦于一个特定的算法类型——随机抽样(Random Sampling),特别是Java 8中的实现。随机抽样在数据处理和分析中扮演着重要角色,它允许我们从大样本集中抽取一...
Random sampling for large-scale MIMO detection of low complexity were studied. In particular, a MMSE approach for random sampling, was formulated from which an iterative detector can be derived for ...
Sub-image Method Based on LBP Preprocessing and Recursive Random Sampling for Face Recognition
在过去十年中,快速扩展随机树(Rapidly-exploring Random Trees,简称RRT)等增量采样算法被证明在实际中工作良好,并具有概率完整性等理论保证。然而,这些算法得到的解的质量的理论界限一直没有建立。本文的贡献...
- **简单随机抽样(Simple Random Sampling)**:每个个体被抽中的概率相等。 - **分层抽样(Stratified Sampling)**:将总体分为若干个子群体(层),然后从每一层中随机抽取样本。 - **系统抽样(Systematic ...
1. **简单随机抽样**(Simple Random Sampling):这是一种最基础的抽样方式,每个个体被抽中的概率相同,适用于总体较小的情况。 2. **分层抽样**(Stratified Sampling):将总体分为若干个层次或子群体,然后从...
Simple random sampling Systematic random sampling Cluster random sampling Stratified random sampling Week 3: Building an intuitive understanding of statistical analysis From area to probability P-...
1. **Random Sampling**:定义一个函数来在配置空间中随机采样。 2. **Nearest Neighbor Search**:实现一种数据结构(如KD树或球树)以高效地找到最近的邻居。 3. **Path Extension**:编写一段代码来计算新边并...
部分分层抽样(Partially Stratified Sampling)是抽样设计的一种,它介于简单随机抽样和完全分层抽样之间。在完全分层抽样中,总体被明确地分为多个不重叠的层,然后从每个层中独立抽取样本。而部分分层抽样则允许...
Chap1~3: 复习基础概率论,包括 random sampling, expectation, Markov’s inequality, variance, and Chebyshev’s inequality Chap4~7: 概率论高阶内容,包括 Chernoff bounds, balls-and-bins models, the ...
generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains....