Given a stream of unsorted integers, find the median element in sorted order at any given time. So, we will be receiving a continuous stream of numbers in some random order and we don’t know the stream length in advance. Write a function that finds the median of the already received numbers efficiently at any time. We will be asked to find the median multiple times. Just to recall, median is the middle element in an odd length sorted array, and in the even case it’s the average of the middle elements.
This is a data structure question. We will insert the received numbers into such a data structure that we’ll be able to find the median very efficiently. Let’s analyse the possible options.
We can insert the integers to an unsorted array, so we’ll just append the numbers to the array one by one as we receive. Insertion complexity is O(1) but finding the median will take O(N) time, if we use the Median of Medians algorithm that I described in my previous post. However, our goal is to find the median most efficiently, we don’t care that much about insertion performance. But this algorithm does the exact opposite, so unsorted array is not a feasible solution.
What about using a sorted array? We can find the position to insert the received number in O(logN) time using binary search. And at any time if we’re asked for the median we can just return the middle element if the array length is odd, or the average of middle elements if the length is even. This can be done in O(1) time, which is exactly what we’re looking for. But there’s a major drawback of using a sorted array. To keep the array sorted after inserting an element, we may need to shift the elements to the right, which will take O(N) time. So, even if finding the position to insert the number takes O(logN) time, the overall insertion complexity is O(N) due to shifting. But finding the median is still extremely efficient, constant time. However, linear time insertion is pretty inefficient and we would prefer a better performance.
Let’s try linked lists. First unsorted linked list. Insertion is O(1), we can insert either to the head or tail but we suffer from the same problem of unsorted array. Finding the median is O(N). What if we keep the linked list sorted? We can find the median in O(1) time if we keep track of the middle elements. Insertion to a particular location is also O(1) in any linked list, so it seems great thus far. But, finding the right location to insert is not O(logN) as in sorted array, it’s instead O(N) because we can’t perform binary search in a linked list even if it is sorted. So, using a sorted linked list doesn’t worth the effort, insertion is O(N) and finding median is O(1), same as the sorted array. In sorted array insertion is linear due to shifting, here it’s linear because we can’t do binary search in a linked list. This is a very fundamental data structure knowledge that we should keep at the top of our heads all the time.
Using a stack or queue wouldn’t help as well. Insertion would be O(1) but finding the median would be O(N), very inefficient.
What if we use trees? Let’s use a binary search tree with additional information at each node, number of children on the left and right subtrees. We also keep the number of total nodes in the tree. Using this additional information we can find the median in O(logN) time, taking the appropriate branch in the tree based on number of children on the left and right of the current node. However, the insertion complexity is O(N) because a standard binary search tree can degenerate into a linked list if we happen to receive the numbers in sorted order.
So, let’s use a balanced binary search tree to avoid worst case behaviour of standard binary search trees. In a height balanced binary search tree (i.e. AVL tree) the balance factor is the difference between the heights of left and right subtrees. A node with balance factor 0, +1, or -1 is considered to be balanced. However, in our tree the balance factor won’t be height, it is the number of nodes in the left subtree minus the number of nodes in the right subtree. And only the nodes with balance factor of +1 or 0 are considered to be balanced. So, the number of nodes on the left subtree is either equal to or 1 more than the number of nodes on the right subtree, but not less. If we ensure this balance factor on every node in the tree, then the root of the tree is the median, if the number of elements is odd. In the even case, the median is the average of the root and its inorder successor, which is the leftmost descendent of its right subtree. So, complexity of insertion maintaining balance condition is O(logN) and find median operation is O(1) assuming we calculate the inorder successor of the root at every insertion if the number of nodes is even. Insertion and balancing is very similar to AVL trees. Instead of updating the heights, we update the number of nodes information.
Balanced binary search trees seem to be the most optimal solution, insertion is O(logN) and find median is O(1). Can we do better? We can achieve the same complexity with a simpler and more elegant solution. We will use 2 heaps simultaneously, a max-heap and a min-heap with 2 requirements. The first requirement is that the max-heap contains the smallest half of the numbers and min-heap contains the largest half. So, the numbers in max-heap are always less than or equal to the numbers in min-heap. Let’s call this the order requirement. The second requirement is that, the number of elements in max-heap is either equal to or 1 more than the number of elements in the min-heap. So, if we received 2N elements (even) up to now, max-heap and min-heap will both contain N elements. Otherwise, if we have received 2N+1 elements (odd), max-heap will contain N+1 and min-heap N. Let’s call this the size requirement.
The heaps are constructed considering the two requirements above. Then once we’re asked for the median, if the total number of received elements is odd, the median is the root of the max-heap. If it’s even, then the median is the average of the roots of the max-heap and min-heap. Let’s now analyse why this approach works, and how we construct the heaps.
We will have two methods, insert a new received number to the heaps and find median. The insertion procedure takes the two requirements into account, and it’s executed every time we receive a new element. We take two different approaches depending on whether the total number of elements is even or odd before insertion.
Let’s first analyze the size requirement during insertion. In both cases we insert the new element to the max-heap, but perform different actions afterwards. In the first case, if the total number of elements in the heaps is even before insertion, then there are N elements both in max-heap and min-heap because of the size requirement. After inserting the new element to the max-heap, it contains N+1 elements but this doesn’t violate the size requirement. Max-heap can contain 1 more element than min-heap. In the second case, if the number of elements is odd before insertion, then there are N+1 elements in max-heap and N in min-heap. After we insert the new element to the max-heap, it contains N+2 elements. But this violates the size constraint, max-heap can contain at most 1 more element than min-heap. So we pop an element from max-heap and push it to min-heap. The details will be described soon.
Now let’s analyse the order requirement. This requirement forces every element in the max-heap to be less than or equal to all the elements in min-heap. So the max-heap contains the smaller half of the numbers and the min-heap contains the larger half. Note that by design the root of the max-heap is the maximum of the lower half, and root of the min-heap is the minimum of the upper half. Keeping these in mind, we again take two different actions depending on whether the total number of elements is even or odd before insertion. In the even case we just inserted the new element to the max-heap. If the new element is less than all the elements in the min-heap, then the order constraint is satisfied and we’re done. We can perform this check by comparing the new element to the root of the min-heap in O(1) time since the root of the min-heap is the minimum. But if the new element is larger than the root of min-heap then we should exchange those elements to satisfy the order requirement. Note that in this case the root of the max-heap is the new element. So we pop the the root of min-heap and insert it to max-heap. Also pop the root of max-heap and insert it to min-heap. In second case, where the total number of elements before insertion is odd, we inserted the new element to max-heap, then we popped an element and pushed it to the min-heap. To satisfy the order constraint, we pop the maximum element of the max-heap, the root, and insert it to the min-heap. Insertion complexity is O(logN), which is the insertion complexity of a heap.
That is exactly how the insertion procedure works. We ensured that both size and order requirements are satisfied during insertion. Find median function works as follows. At any time we will be queried for the median element. If the total number of elements at that time is odd, then the median is the root of the max-heap. Let’s visualize this with an example. Assume that we have received 7 elements up to now, so the median is the 4th number in sorted order. Currently, max-heap contains 4 smallest elements and min-heap contains 3 largest because of the requirements described above. And since the root of the max-heap is the maximum of the smallest four elements, it’s the 4th element in sorted order, which is the median. Else if the total number of elements is even, then the median is the average of the roots of max-heap and min-heap. Let’s say we have 8 elements, so the median is the average of 4th and 5th elements in sorted order. Currently, both the max-heap and min-heap contain 4 numbers. Root of the max-heap is the maximum of the smallest numbers, which is 4th in sorted order. And root of the min-heap is the minimum of the largest numbers, which is 5th in sorted order. So, the median is the average of the roots. In both cases we can find the median in O(1) time because we only access the roots of the heaps, neither insertion nor removal is performed. Therefore, overall this solution provides O(1) find heap and O(logN) insert.
A code is worth a thousand words, here is the code of the 2-heaps solution. As you can see, it’s much less complicated than it’s described. We can use the heapq module in python, which provides an implementation of min-heap only. But we need a max-heap as well, so we can make a min-heap behave like a max-heap by multiplying the number to be inserted by -1 and then inserting. So, every time we insert or access an element from the max-heap, we multiply the value by -1 to get the original number:
public class MedianOfIntegerStream { private int count; private Queue<Integer> minHeap = new PriorityQueue<>(); private Queue<Integer> maxHeap = new PriorityQueue<>(new Comparator<Integer>(){ @Override public int compare(Integer o1, Integer o2) { return o2-o1; } }); public double addNumberToStream(int num) { maxHeap.offer(num); if(count % 2 == 0) { if(!minHeap.isEmpty() && maxHeap.peek() > minHeap.peek()) { int maxHeapRoot = maxHeap.poll(); int minHeapRoot = minHeap.poll(); maxHeap.offer(minHeapRoot); minHeap.offer(maxHeapRoot); } } else { minHeap.offer(maxHeap.poll()); } count++; return getMedian(); } public double getMedian() { if(count % 2 == 0) { return (maxHeap.peek() + minHeap.peek()) / 2.0; } else { return maxHeap.peek(); } } public static void main(String[] args) { MedianOfIntegerStream streamMedian = new MedianOfIntegerStream(); streamMedian.addNumberToStream(10); streamMedian.addNumberToStream(50); streamMedian.addNumberToStream(-50); streamMedian.addNumberToStream(3); streamMedian.addNumberToStream(16); } }
From:
http://www.ardendertat.com/2011/11/03/programming-interview-questions-13-median-of-integer-stream/
相关推荐
different stream environments and methodological constraints to using LiDAR for this purpose. 4 5 CHAPTER II BACKGROlJND Water Surface Slope Water surface slope is a significant component to many ...
The number of questions is increasing recently. Here is the classification of all `468` questions. For more questions and solutions, you can see my [LintCode](https://github.com/kamyu104/LintCode) ...
级联H桥SVG无功补偿系统在不平衡电网中的三层控制策略:电压电流双闭环PI控制、相间与相内电压均衡管理,级联H桥SVG无功补偿系统在不平衡电网中的三层控制策略:电压电流双闭环PI控制、相间与相内电压均衡管理,不平衡电网下的svg无功补偿,级联H桥svg无功补偿statcom,采用三层控制策略。 (1)第一层采用电压电流双闭环pi控制,电压电流正负序分离,电压外环通过产生基波正序有功电流三相所有H桥模块直流侧平均电压恒定,电流内环采用前馈解耦控制; (2)第二层相间电压均衡控制,注入零序电压,控制通过注入零序电压维持相间电压平衡; (3)第三层相内电压均衡控制,使其所有子模块吸收的有功功率与其损耗补,从而保证所有H桥子模块直流侧电压值等于给定值。 有参考资料。 639,核心关键词: 1. 不平衡电网下的SVG无功补偿 2. 级联H桥SVG无功补偿STATCOM 3. 三层控制策略 4. 电压电流双闭环PI控制 5. 电压电流正负序分离 6. 直流侧平均电压恒定 7. 前馈解耦控制 8. 相间电压均衡控制 9. 零序电压注入 10. 相内电压均衡控制 以上十个关键词用分号分隔的格式为:不
GTX 1080 PCB图纸,内含图纸查看软件
内容概要:本文档详细介绍了利用 DeepSeek 进行文本润色和问答交互时提高效果的方法和技巧,涵盖了从明确需求、提供适当上下文到尝试开放式问题以及多轮对话的十个要点。每一部分内容都提供了具体的示范案例,如指定回答格式、分步骤提问等具体实例,旨在指导用户更好地理解和运用 DeepSeek 提升工作效率和交流质量。同时文中还强调了根据不同应用场景调整提示词语气和风格的重要性和方法。 适用人群:适用于希望通过优化提问技巧以获得高质量反馈的企业员工、科研人员以及一般公众。 使用场景及目标:本文针对所有期望提高 DeepSeek 使用效率的人群,帮助他们在日常工作中快速获取精准的答案或信息,特别是在撰写报告、研究材料准备和技术咨询等方面。此外还鼓励用户通过不断尝试不同形式的问题表述来进行有效沟通。 其他说明:该文档不仅关注实际操作指引,同样重视用户思维模式转变——由简单索取答案向引导 AI 辅助创造性解决问题的方向发展。
基于FPGA与W5500实现的TCP网络通信测试平台开发——Zynq扩展口Verilog编程实践,基于FPGA与W5500芯片的TCP网络通信测试及多路Socket实现基于zynq开发平台和Vivado 2019软件的扩展开发,基于FPGA和W5500的TCP网络通信 测试平台 zynq扩展口开发 软件平台 vivado2019.2,纯Verilog可移植 测试环境 压力测试 cmd命令下ping电脑ip,同时采用上位机进行10ms发包回环测试,不丢包(内部数据回环,需要时间处理) 目前实现单socket功能,多路可支持 ,基于FPGA; W5500; TCP网络通信; Zynq扩展口开发; 纯Verilog可移植; 测试平台; 压力测试; 10ms发包回环测试; 单socket功能; 多路支持。,基于FPGA与W5500的Zynq扩展口TCP通信测试:可移植Verilog实现的高效网络通信
Labview液压比例阀伺服阀试验台多功能程序:PLC通讯、液压动画模拟、手动控制与调试、传感器标定、报警及记录、自动实验、数据处理与查询存储,报表生成与打印一体化解决方案。,Labview液压比例阀伺服阀试验台多功能程序:PLC通讯、液压动画模拟、手动控制与调试、传感器标定、报警管理及实验自动化,labview液压比例阀伺服阀试验台程序:功能包括,同PLC通讯程序,液压动画,手动控制及调试,传感器标定,报警设置及报警记录,自动实验,数据处理曲线处理,数据库存储及查询,报表自动生成及打印,扫码枪扫码及信号录入等~ ,核心关键词:PLC通讯; 液压动画; 手动控制及调试; 传感器标定; 报警设置及记录; 自动实验; 数据处理及曲线处理; 数据库存储及查询; 报表生成及打印; 扫码枪扫码。,Labview驱动的智能液压阀测试系统:多功能控制与数据处理
华为、腾讯、万科员工职业发展体系建设与实践.pptx
1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。
电网不对称故障下VSG峰值电流限制的柔性控制策略:实现电流平衡与功率容量的优化利用,电网不对称故障下VSG峰值电流限制的柔性控制策略:兼顾平衡电流与功率控制切换的动态管理,电网不对称故障下VSG峰值电流限制的柔性不平衡控制(文章完全复现)。 提出一种在不平衡运行条件下具有峰值电流限制的可变不平衡电流控制方法,可灵活地满足不同操作需求,包括电流平衡、有功或无功恒定运行(即电流控制、有功控制或无功控制之间的相互切),注入电流保持在安全值内,以更好的利用VSG功率容量。 关键词:VSG、平衡电流控制、有功功率控制、无功功率控制。 ,VSG; 峰值电流限制; 柔性不平衡控制; 电流平衡控制; 有功功率控制; 无功功率控制。,VSG柔性控制:在电网不对称故障下的峰值电流限制与平衡管理
1、文件内容:libpinyin-tools-0.9.93-4.el7.rpm以及相关依赖 2、文件形式:tar.gz压缩包 3、安装指令: #Step1、解压 tar -zxvf /mnt/data/output/libpinyin-tools-0.9.93-4.el7.tar.gz #Step2、进入解压后的目录,执行安装 sudo rpm -ivh *.rpm 4、更多资源/技术支持:公众号禅静编程坊
数据集是一个以经典动漫《龙珠》为主题的多维度数据集,广泛应用于数据分析、机器学习和图像识别等领域。该数据集由多个来源整合而成,涵盖了角色信息、战斗力、剧情片段、台词以及角色图像等多个方面。数据集的核心内容包括: 角色信息:包含《龙珠》系列中的主要角色及其属性,如名称、种族、所属系列(如《龙珠》《龙珠Z》《龙珠超》等)、战斗力等级等。 图像数据:提供角色的图像资源,可用于图像分类和角色识别任务。这些图像来自动画剧集、漫画和相关衍生作品。 剧情与台词:部分数据集还包含角色在不同故事中的台词和剧情片段,可用于文本分析和自然语言处理任务。 战斗数据:记录角色在不同剧情中的战斗力变化和战斗历史,为研究角色成长和剧情发展提供支持。 数据集特点 多样性:数据集整合了角色、图像、文本等多种类型的数据,适用于多种研究场景。 深度:不仅包含角色的基本信息,还涵盖了角色的成长历程、技能描述和与其他角色的互动关系。 实用性:支持多种编程语言(如Python、R)的数据处理和分析,提供了详细的文档和示例代码。
基于protues仿真的多功公交站播报系统设计(仿真图、源代码) 该设计为基于protues仿真的多功公交站播报系统,实现温度显示、时间显示、和系统公交站播报功能; 具体功能如下: 1、系统使用51单片机为核心设计; 2、时钟芯片进行时间和日期显示; 3、温度传感器进行温度读取; 4、LCD12864液晶屏进行相关显示; 5、按键设置调节时间; 6、按键设置报站; 7、仿真图、源代码; 操作说明: 1、下行控制报站:首先按下(下行设置按键),(下行指示灯)亮,然后按下(手动播报)按键控制播报下一站; 2、上行控制报站:首先按上(上行设置按键),(上行指示灯)亮,然后按下(手动播报)按键控制播报下一站; 3、按下关闭播报按键,则关闭播报功能和清除显示
采用Java后台技术和MySQL数据库,在前台界面为提升用户体验,使用Jquery、Ajax、CSS等技术进行布局。 系统包括两类用户:学生、管理员。 学生用户 学生用户只要实现了前台信息的查看,打开首页,查看网站介绍、琴房信息、在线留言、轮播图信息公告等,通过点击首页的菜单跳转到对应的功能页面菜单,包括网站首页、琴房信息、注册登录、个人中心、后台登录。 学生用户通过账户账号登录,登录后具有所有的操作权限,如果没有登录,不能在线预约。学生用户退出系统将注销个人的登录信息。 管理员通过后台的登录页面,选择管理员权限后进行登录,管理员的权限包括轮播公告管理、老师学生信息管理和信息审核管理,管理员管理后点击退出,注销登录信息。 管理员用户具有在线交流的管理,琴房信息管理、琴房预约管理。 在线交流是对前台用户留言内容进行管理,删除留言信息,查看留言信息。
MATLAB可以用于开发人脸识别考勤系统。下面是一个简单的示例流程: 1. 数据采集:首先收集员工的人脸图像作为训练数据集。可以要求员工提供多张照片以获得更好的训练效果。 2. 图像预处理:使用MATLAB的图像处理工具对采集到的人脸图像进行预处理,例如灰度化、裁剪、缩放等操作。 3. 特征提取:利用MATLAB的人脸识别工具包,如Face Recognition Toolbox,对处理后的图像提取人脸特征,常用的方法包括主成分分析(PCA)和线性判别分析(LDA)等。 4. 训练模型:使用已提取的人脸特征数据集训练人脸识别模型,可以选择支持向量机(SVM)、卷积神经网络(CNN)等算法。 5. 考勤系统:在员工打卡时,将摄像头捕获的人脸图像输入到训练好的模型中进行识别,匹配员工信息并记录考勤数据。 6. 结果反馈:根据识别结果,可以自动生成考勤报表或者实时显示员工打卡情况。 以上只是一个简单的步骤,实际开发过程中需根据具体需求和系统规模进行定制和优化。MATLAB提供了丰富的图像处理和机器学习工具,是开发人脸识别考勤系统的一个很好选择。
hjbvbnvhjhjg
HCIP、软考相关学习PPT提供下载
绿豆BOX UI8版:反编译版六个全新UI+最新后台直播管理源码 最新绿豆BOX反编译版六个UI全新绿豆盒子UI8版本 最新后台支持直播管理 作为UI6的升级版,UI8不仅修复了前一版本中存在的一些BUG,还提供了6套不同的UI界面供用户选择,该版本有以下特色功能: 在线管理TVBOX解析 在线自定义TVBOX 首页布局批量添加会员信息 并支持导出批量生成卡密 并支持导出直播列表管理功能
vue3的一些语法以及知识点
西门子大型Fanuc机器人汽车焊装自动生产线程序经典解析:PLC博图编程与MES系统通讯实战指南,西门子PLC博图汽车焊装自动生产线FANUC机器人程序经典结构解析与MES系统通讯,西门子1500 大型程序fanuc 机器人汽车焊装自动生产线程序 MES 系统通讯 大型程序fanuc机器人汽车焊装自动生产线程序程序经典结构清晰,SCL算法堆栈,梯形图和 SCL混编使用博图 V14以上版本打开 包括: 1、 PLC 博图程序 2 触摸屏程序 ,西门子1500; 大型程序; fanuc机器人; 汽车焊装自动生产线; MES系统通讯; SCL算法; 梯形图; SCL混编; 博图V14以上版本。,西门子博图大型程序:汽车焊装自动生产线MES系统通讯与机器人控制