关于数据挖掘关联规则的Oracle 实现

fengxiangpiao

浏览: 140956 次
性别:
来自: 郑州

最近访客更多访客>>

gdtiti

戈多219

fengkuang11

gongmaolan

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

数据库

数据挖掘 Oracle C C++C#

呵呵，前几天拿到了数据挖掘基础教程一书，感觉部分算法是基于统计学的原理的，而统计学是可以通过 Oracle 来实现。

其次是为了观看德国 vs 西班牙的世界杯比赛，来了一点小小的兴致，动手写点小脚本。不过本文只是为了实现而实现的，没有做任何优化，有兴趣的话，大家可以玩一玩。

关于数据挖掘关联规则的材料，可以参见：

http://baike.baidu.com/view/1076817.htm?fr=ala0_1

关联规则是形如 X → Y 的蕴涵式，

其中且， X 和 Y 分别称为关联规则的先导 (antecedent 或 left-hand-side, LHS) 和后继 (consequent 或 right-hand-side, RHS) 。

关联规则在 D 中的支持度 (support) 是 D 中事务同时包含 X 、 Y 的百分比，即概率； =X^Y/D

置信度 (confidence) 是包含 X 的事务中同时又包含 Y 的百分比，即条件概率。 =(X^Y)/X

关联规则是有趣的，如果满足最小支持度阈值和最小置信度阈值。

若给定最小支持度α = n ，最小置信度β = m ，则分别通过以上的 X^Y/D 和 (X^Y)/X ，可获知是否存在关联

使用的原始数据

反范式后的数据

待统计项

-- 创建各个购买单元项视图

create view distinct_trans as select distinct tranobject from purchase;

-- 创建各个事务内部的购买单元项

create view all_trans as

-- 可以用 wm_concat 函数

SELECT tranid,MAX(tranobjects) tranobjects

FROM (select tranid,WMSYS.WM_CONCAT(tranobject) OVER(PARTITION BY tranid ORDER BY tranobject) tranobjects

from purchase

)

group by tranid;

-- 也可以用 sys_connect_by_path 函数

create view all_trans as

select tranid,substr(tranobjects,2) tranobjects from -- 格式化前面的逗号和空格

(

select distinct tranid,FIRST_VALUE(tranobjects) OVER(PARTITION BY tranid ORDER BY levels desc ) AS tranobjects -- 保留最大的那个

from

(

select tranid,sys_connect_by_path(tranobject,',') tranobjects,level levels -- 各购买事务的内部排列组合

from purchase

connect by tranid=prior tranid and tranobject<prior tranobject

)

);

-- 对所有购买单元项进行排列组合，即数据挖掘的 X^Y 项

create view all_zuhe as

select substr(sys_connect_by_path(tranobject,','),2) zuhe

from (select distinct tranobject from purchase)

connect by nocycle tranobject<prior tranobject;

select * from all_zuhe

-- 筛选出符合要求的排列组合，即数据挖掘的 X 项和 Y 项

create view full_zuhe as

select a.zuhe X,b.zuhe Y from all_zuhe a,all_zuhe b

where instr(a.zuhe,b.zuhe)=0 and instr(b.zuhe,a.zuhe)=0

and not exists(select 1 from distinct_trans c

where instr(a.zuhe,c.tranobject)>0 and instr(b.zuhe,c.tranobject)>0)

select * from full_zuhe

create or replace view tongji as

select xy,xy_total,x,x_total,y,y_total,transtotal from

(

select y||','||x xy,

(select count(*) from all_trans a where instr(a.tranobjects,c.x||','||c.y)>0 or instr(a.tranobjects,c.y||','||c.x)>0) xy_total, -- 包含 xy 的事务数

(select count(*) from all_trans b where instr(b.tranobjects,c.y)>0) y_total, -- 包含 y 的事务数

(select count(*) from all_trans b where instr(b.tranobjects,c.x)>0) x_total, -- 包含 x 的事务数

d.transtotal -- 总事务数

from full_zuhe c,(select count(distinct tranid) transtotal from purchase) d

order by xy_total desc,x_total desc

)

select * from tongji where xy_total>=3 and y_total>=3

分享到：

Oracle数据库恢复 : 存储故障导致的数据损 ... | C#开发WPF/Silverlight动画及游戏系列教程 ...

2010-12-17 09:11
浏览 896
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论