浏览全部资源
扫码关注微信
1. 湖南工业大学计算机学院智能信息感知及处理技术湖南省重点实验室,湖南 株洲 412007
2. 中南大学信息科学与工程学院,湖南 长沙 410083
[ "袁鑫攀(1982-),男,湖南株洲人,博士,湖南工业大学讲师,主要研究方向为相似性度量和索引。" ]
[ "汪汕飞(1992-),男,安徽安庆人,湖南工业大学硕士生,主要研究方向为信息检索。" ]
[ "龙军(1972-),男,安徽安庆人,中南大学教授、博士生导师,主要研究方向为网络资源管理与可信评估。" ]
[ "章成源(1986-),男,湖南邵阳人,中南大学讲师,主要研究方向为空间数据库。" ]
[ "满君丰(1976-),男,黑龙江海伦人,博士,湖南工业大学教授,主要研究方向为可信软件。" ]
网络出版日期:2017-10,
纸质出版日期:2017-10-25
移动端阅览
袁鑫攀, 汪灿飞, 龙军, 等. FGBC-iDistance:细粒度位码过滤的高维索引[J]. 通信学报, 2017,38(Z1):127-134.
Xin-pan YUAN, Can-fei WANG, Jun LONG, et al. FGBC-iDistance:fine-grained bit-code filter based high-dimensional index[J]. Journal on communications, 2017, 38(Z1): 127-134.
袁鑫攀, 汪灿飞, 龙军, 等. FGBC-iDistance:细粒度位码过滤的高维索引[J]. 通信学报, 2017,38(Z1):127-134. DOI: 10.11959/j.issn.1000-436x.2017245.
Xin-pan YUAN, Can-fei WANG, Jun LONG, et al. FGBC-iDistance:fine-grained bit-code filter based high-dimensional index[J]. Journal on communications, 2017, 38(Z1): 127-134. DOI: 10.11959/j.issn.1000-436x.2017245.
在高维向量检索中,距离计算是很耗时的操作,当前科研趋势是采用分治法来减少距离计算。iDistance通过锚点将向量空间划分为聚类子空间,BC-iDistance通过BC码将聚类子空间每维划分成2个区域。提出一种更加细粒度的区域划分方法和索引结构,每个区域对应一个细粒度位码FGBC(fine grained bit code),通过FGBC码实现了对候选集更精准的过滤。FGBC-iDistance的距离计算次数最好能减少到iDistance的
<math xmlns="http://www.w3.org/1998/Math/MathML"> <mfrac> <mn>1</mn> <mrow> <msup> <mn>2</mn> <mrow> <mn>2</mn><mi>d</mi></mrow> </msup> </mrow> </mfrac> </math>
,在距离计算次数上,有 FGBC-iDistance≤BC-iDistance≤iDistance。实验结果表明当范围查询半径为 0.08 时,FGBC-iDistance的距离计算次数约为20 000次,远小于其他算法,运行时间也相应减少。
In the high-dimensional vector retrieval
distance computation is a very time-consuming operation
the current research trend is to reduce the distance computation using divide and conquer algorithm.iDistance algorithm divides the vector space into subspace of clustering by pivot
BC-iDistance algorithm divides the subspace into 2 partitions in each dimension.A more fine-grained partition algorithm and index structure was proposed
each part corresponded with a unique FGBC code (fine-grained bit code)
which realized the candidate sets filtered more precisely.The distance computation times of FGBC-iDistance can be reduced to
<math xmlns="http://www.w3.org/1998/Math/MathML"> <mfrac> <mn>1</mn> <mrow> <msup> <mn>2</mn> <mrow> <mn>2</mn><mi>d</mi></mrow> </msup> </mrow> </mfrac> </math>
of iDistance.The distance computation frequency comparison:FGBC-iDistance≤BC-iDistance≤iDistance.The experiment results show that when the scope radius of the range query is 0.08
FGBC-iDistance distance computation times is about 20
000
which is far less than other algorithms
the running time is also reduced.
FILZMOSER P , TODOROV V . Review of robust multivariate statistical methods in high dimension [J ] . Analytica Chimica Acta , 2011 , 705 ( 1-2 ): 2 - 14 .
肖碧玉 , 李先斌 , 刘文斌 . 基于图元向量的差异共表达分析研究 [J ] . 电子学报 , 2015 , 10 : 2009 - 2013 .
XIAO B Y , LI X B , LIU W B . Mining differential CO-expression clusters based on graphlet orbits [J ] . Acta Electronica Sinca , 2015 , 10 : 2009 - 2013 .
ROKACH L , MAIMON O . Data mining with decision trees:theory and applications [M ] . World scientific , 2014 .
LIU S G , WEI Y W . Fast nearest neighbor searching based on improved VP-tree [J ] . Pattern Recognition Letters , 2015 , 60-61 ( C ): 8 - 15 .
LIU X . The dirac operator on regular metric trees [J ] . Physics , 2015 , 226 : 165 - 178 .
NOVAK D , BATKO M , ZEZULA P . Metric index:an efficient and scalable solution for precise and approximate similarity search [J ] . Information Systems , 2011 , 36 ( 4 ): 721 - 733 .
ZHOU K , HOU Q , Wang R , et al . Real-time KD-tree construction on graphics hardware [C ] // ACM SIGGRAPH Asia . 2008 :126.
秦龙 , 郑烇 , 桂舒婷 , 等 . 近似逼近高维索引方法 [J ] . 小型微型计算机系统 , 2014 , 35 ( 5 ): 951 - 955 .
QIN L , ZHENG Q , GUI S T , et al . Approximate approaching high-dimensional indexing method [J ] . Journal of Chinese Computer System , 2014 , 35 ( 5 ): 951 - 955 .
WU J Z , YU X , GAO W . Distance computation of ontology vector for ontology similarity measuring and ontology mapping [J ] . Journal of Difference Equations & Applications , 2016 : 1 - 12 .
SCHUH M A , WYLIE T , BANDA J M , et al . A comprehensive study of iDistance partitioning strategies for KNN queries and high- dimensional data indexing [M ] . Big Data . Springer Berlin Heidelberg , 2013 : 238 - 252 .
JAGADISH H V , OOI B C , TAN K L , et al . iDistance [J ] . ACM Transactions on Database Systems , 2005 , 30 ( 2 ): 364 - 397 .
LIANG J J , FENG Y C . BC-iDistance:an optimized high-dimensional index for KNN processing [J ] . Journal of Harbin Institute of Technology , 2008 , 15 ( 6 ): 856 - 861 .
吴纯青 , 任沛阁 , 王小峰 . 基于语义的网络大数据组织与搜索 [J ] . 计算机学报 , 2015 , 38 ( 1 ): 1 - 17 .
WU C Q , REN P G , WANG X F , et al . Survey on semantic-based organization and search technologies for network big data [J ] . Chinese Journal of Computers , 2015 , 38 ( 1 ): 1 - 17 .
0
浏览量
898
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构