`

直方图

阅读更多

Histogram tells the optimizer how the data is distributed for a column. This information is used in determining the selectivity of the column for a given query and arriving at an optimal execution plan.

Column statistics in the form of histograms are appropriate for columns whose data distribution deviates from the expected uniform distribution. For uniformly distributed data, the optimizer can do proper costing for executing a particular statement. When data is not uniformly distributed, also know as highly skewed data distribution, the optimizer may not be in a position to accurately estimate the selectivity of a query. This option provides statistics at a very low level and its use would be rare, though it could prove very beneficial in certain scenarios.

Columns not eligible for histogram

Please note that histograms should not be used when any of the following is true.

  1. The column data is uniformly distributed.

    For example, if we have a column in a table that holds around 100 distinct values. If the number of records that each value holds (or a range of values) is somewhat similar, the data is more or less uniformly distributed.

    For instance, the table may have 100000 records, 20% of these are between values 1-15, 15% are between 16-30, 25% are between 31-50 and so on. If we design a data distribution graph (histogram chart) on these figures, the height of each value or range will be more or less balanced.

  2. The column is not at all used in query predicates.

    There is no need for providing histogram statistics on columns that are not being used in query conditions. Histogram statistics are stored in the dictionary and do take up space and analysis time of the optimizer.

  3. All query predicates or criteria for the column use bind variables!

    Yes that's right, here Oracle requires that hard coded values be provided for use of the histogram statistics. Bind variables will not make use of the same.

  4. The column is unique and used only with equality predicates.

Columns eligible for histogram

If the above rules are not satisfying for a column, it could be considered for distribution statistics. For example, if we again consider a column that holds around 100 distinct statuses in a 100000 records table, 80% of the values lie between 15-30 and the remaining 20% is distributed in the other ranges. If we design a data distribution graph (histogram chart) on these figures, the height of few values or ranges will be very high where as others will be very low. This shows an oblique line for the data distribution.

A histogram is good for number columns. For character columns only the first 32 bytes (as of 8.1.7.4) of the string are used for building the histograms, this may sometimes result in incorrect information being created if the size of the column data exceeds this limitation.

User-specific histogram values can also be stored in the dictionary using the DBMS_STATS.PREPARE_COLUMN_VALUES and DBMS_STATS.SET_COLUMN_VALUES routines.

Dictionary tables

Histogram information is stored in the following dictionary tables.

Histogram values for columns in tables:

  DBA_TAB_HISTOGRAMS

  • endpoint_number - End point number
  • endpoint_value - Normalized end point value for the buckets.
  • endpoint_actual_value - Actual data value, only shows non-numeric value for the column.

For partition table histograms values:

  DBA_PART_HISTOGRAMS
  DBA_SUBPART_HISTOGRAMS

For evaluating histograms on indexed columns:

  INDEX_HISTOGRAM

  • repeat_count - Number of times one or more index key is repeated in the table.
  • keys_with_repeat_count - Number of index keys that are repeated.

Other Views that give similar data:

  DBA_TAB_COL_STATISTICS
  DBA_PART_COL_STATISTICS   DBA_SUPPART_COL_STATISTICS

Columns in the above tables are self-explanatory.

Buckets in Histograms

Histogram statistics are stored in the form of buckets. Buckets represent the partitioning of data values, depending on the range. By default, 75 buckets are created. A maximum of 254 buckets can be specified for a column. How many buckets are required for a column will depend on the occurrences of distinct values. The default number of buckets is appropriate, but you will have to experiment with various bucket sizes to find the most suitable size.

If the number of distinct column values is less than the number of buckets specified, the individual column values and the count of these values is directly stored as histogram statistics. If the number of distinct column values is more than the buckets specified, Oracle uses an algorithm to store these values in ranges. If a series of continuous ranges have the same value, they may not be shown in the histogram table to save on space.

You may find columns with one-bucket histograms, these are as good as no histogram statistics and the optimizer ignores them.

分享到:
评论

相关推荐

    matlab直方图匹配_直方图匹配_

    直方图匹配是一种在图像处理领域中广泛应用的技术,主要用于调整图像的亮度和对比度,使得一张图像的色调分布与另一张图像相匹配。这个过程能够帮助我们统一不同图像的视觉效果,尤其在图像融合、图像增强或者在多源...

    直方块直方图和折线直方图

    在计算机视觉领域,直方图是一种非常重要的工具,用于表示数据分布情况。直方图可以是直方块形式或折线形式,这两种方法各有特点,适用于不同的分析需求。本项目包含两个程序,分别实现了直方块和折线两种方式来绘制...

    直方图均衡_PSNR_直方图均衡化_

    直方图均衡化是图像处理领域中的一种重要技术,它主要应用于改善图像的对比度,尤其是在图像的亮度分布不均匀时效果显著。直方图均衡化通过对图像的灰度级进行重新分配,使得整个图像的灰度级分布更加均匀,从而达到...

    matlab灰度图直方图均衡化代码_matlab_直方图_直方图均衡化_

    在图像处理领域,直方图均衡化是一种常用的增强图像对比度的方法,特别是在处理低对比度图像时效果显著。本文将详细讲解如何使用MATLAB来实现灰度图像的直方图均衡化,以及直方图均衡化的原理。 首先,我们要了解...

    直方图均衡化、直方图变换、对比度自适应直方图均衡化

    ### 直方图均衡化、直方图变换、对比度自适应直方图均衡化 #### 直方图均衡化(Histogram Equalization) 直方图均衡化是一种图像处理技术,主要用于增强图像的对比度,特别是在图像整体亮度较暗或者对比度较低的...

    matlab图像直方图均衡化和直方图匹配代码

    在图像处理领域,直方图均衡化和直方图匹配是两种重要的技术,它们用于改善图像的对比度和调整图像的亮度分布。MATLAB作为强大的数值计算和图像处理平台,提供了丰富的工具和函数来实现这些功能。下面我们将深入探讨...

    直方图均衡去雾的四种做法_直方图去雾_直方图均衡_图像增强matlab_

    直方图均衡化是图像处理领域中一种常用的技术,它能显著提升图像的对比度,尤其是在处理低对比度或雾天图像时效果尤为明显。在本文中,我们将深入探讨直方图均衡去雾的四种主要做法,以及如何利用MATLAB进行图像增强...

    图像灰度直方图

    图像灰度直方图是数字图像处理中的一个基础概念,它是一种统计图像中不同灰度级像素出现频率的图形表示。在单通道(通常为灰度图像)的情况下,直方图能够直观地揭示图像的整体亮度分布和局部特性,对理解和分析图像...

    ArcGIS教程:创建直方图

    ### ArcGIS教程:创建直方图 #### 直方图概念与优势 直方图是一种图形展示方式,常被用来直观地展现连续性数据的分布情况。它通过一系列的条形(柱状)来表示数据的频数分布,每一个条形代表一个区间内的数据个数...

    图像颜色直方图分布(可将图片中的颜色分布用直方图表示出来)

    图像颜色直方图是计算机视觉领域中一个基础且重要的概念,它用于描述图像中像素颜色的分布情况。在处理图像时,理解颜色直方图可以帮助我们更好地分析图像特征,进行图像增强、分类、识别等任务。本文将深入探讨图像...

    绘制图像的直方图

    在图像处理领域,直方图是一种非常重要的工具,它能够帮助我们理解图像的色彩分布情况。直方图绘制是分析图像特征、调整图像对比度和亮度的重要手段。在这个主题中,我们将深入探讨如何将一个多通道图像(如RGB图像...

    直方图规定化

    直方图规定化,也称为直方图均衡化,是一种图像处理技术,用于增强图像的对比度,特别是在图像中存在大面积相近灰度值时。它通过改变图像的灰度级分布,使得整个图像的灰度级更加均匀,从而提高图像的可读性和视觉...

    直方图规定化_数字图像处理matlab_直方图规定化_

    直方图规定化是数字图像处理中的一个关键技术,它用于改变图像的亮度分布,使得图像的视觉效果更佳或更适合后续分析。在MATLAB环境中,直方图规定化可以帮助我们调整图像的对比度,使其在不同的光照条件下具有更均匀...

    直方图均衡化

    直方图均衡化是一种在数字图像处理中广泛使用的技术,主要目的是提高图像的对比度,尤其是在图像的整体灰度分布较为集中或不均匀时效果显著。它通过改变图像的灰度级映射,使得图像的灰度值分布更加均匀,从而在视觉...

    基于直方图优化的图像去雾技术_去雾_直方图_图像处理_源码

    直方图优化是一种常用的图像增强方法,它通过调整图像的亮度分布来改善图像的整体视觉效果。本项目将这两种技术结合,提供了一个用MATLAB语言实现的基于直方图优化的图像去雾解决方案。 首先,我们来深入理解直方图...

    基于灰度直方图的图像检索

    为了提高检索效率,还可以采用一些优化策略,如直方图的预处理(如直方图均衡化,增强图像的对比度)、直方图的量化(减少灰度级,降低计算复杂性)以及使用特征向量(如PCA或LBP)来压缩直方图信息。同时,考虑到...

    opencv直方图计算

    **OpenCV直方图计算详解** 在计算机视觉领域,直方图是一种强大的工具,用于量化图像的像素分布。OpenCV库提供了丰富的功能来处理图像,其中包括直方图的计算。本篇文章将深入探讨如何在Visual Studio 2010环境下...

    直方图均衡化-自实现_matlab图像处理直方图均衡化_

    直方图均衡化是一种在数字图像处理中广泛使用的增强图像对比度的方法,尤其适用于低对比度图像。在MATLAB环境中,我们可以通过自定义函数或使用内置的`histeq`函数来实现这一过程。本篇文章将深入探讨直方图均衡化的...

    bmp图像直方图处理

    BMP图像直方图处理是计算机图形学中的一个重要概念,它涉及到图像分析和处理的基本技术。在数字图像处理中,直方图是一个统计工具,用于表示图像中不同亮度或颜色分量出现的频率。直方图可以帮助我们理解图像的整体...

Global site tag (gtag.js) - Google Analytics