- 浏览: 1290120 次
- 性别:
- 来自: 江苏
最新评论
-
honey_fansy:
的确,不要自己的支持就说完美支持,我的就不行,别说我的不是fi ...
无js实现text-overflow: ellipsis; 完美支持Firefox -
fanchengfei:
事件长微博,欢迎转发:http://weibo.com/332 ...
《在路上 …》 写代码也需要一点演技 – python2.6 的 class decorator -
blued:
没有报错,但排版效果一点都没有 咋回事。请指教
python排版工具 -
szxiaoli:
耍人呀,效果在哪儿呀
滑动效果 -
accaolei:
这个能监到控子目录吗?,我测试了一下,发现子目录里的文件监控不 ...
windows监控目录改动
封装的是附件这篇paper的count
因为对比发现这个的综合性能比较好
xxx@ooo ~/zspal/zfeq/frequent-items/src $ cat Release/pyzlcl.py
#coding:utf-8
from pyzlcl import Lcl
#0.001 是要统计的频率下限
lcl = Lcl(0.001)
for i in xrange(200):
for j in xrange(100):
for k in xrange(j):
lcl.update(j, 1)
for i in xrange(1,100,30):
print i
print "出现的次数(估计值)",lcl.est(i)
print "estimate the worst case error in the estimate of a
particular item :" ,lcl.err(i)
print "---"*20
result = lcl.output(1000)
result.sort(key=lambda x:-x[1])
print result
print lcl.capacity()
xxx@ooo ~/zspal/zfeq/frequent-items/src $ python Release/pyzlcl.py
1
出现的次数(估计值) 200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
31
出现的次数(估计值) 6200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
61
出现的次数(估计值) 12200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
91
出现的次数(估计值) 18200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
[(99, 19800), (98, 19600), (97, 19400), (96, 19200), (95, 19000), (94,
18800), (93, 18600), (92, 18400), (91, 18200), (90, 18000), (89,
17800), (88, 17600), (87, 17400), (86, 17200), (85, 17000), (84,
16800), (83, 16600), (82, 16400), (81, 16200), (80, 16000), (79,
15800), (78, 15600), (77, 15400), (76, 15200), (75, 15000), (74,
14800), (73, 14600), (72, 14400), (71, 14200), (70, 14000), (69,
13800), (68, 13600), (67, 13400), (66, 13200), (65, 13000), (64,
12800), (63, 12600), (62, 12400), (61, 12200), (60, 12000), (59,
11800), (58, 11600), (57, 11400), (56, 11200), (55, 11000), (54,
10800), (53, 10600), (52, 10400), (51, 10200), (50, 10000), (49,
9800), (48, 9600), (47, 9400), (46, 9200), (45, 9000), (44, 8800),
(43, 8600), (42, 8400), (41, 8200), (40, 8000), (39, 7800), (38,
7600), (37, 7400), (36, 7200), (35, 7000), (34, 6800), (33, 6600),
(32, 6400), (31, 6200), (30, 6000), (29, 5800), (28, 5600), (27,
5400), (26, 5200), (25, 5000), (24, 4800), (23, 4600), (22, 4400),
(21, 4200), (20, 4000), (19, 3800), (18, 3600), (17, 3400), (16,
3200), (15, 3000), (14, 2800), (13, 2600), (12, 2400), (11, 2200),
(10, 2000), (9, 1800), (8, 1600), (7, 1400), (6, 1200), (5, 1000)]
44092
c中的用法演示
xxx@ooo ~/zspal/zfeq/frequent-items/src $ cat zlcl.cc
#include "prng.h"
#include "lossycount.h"
#include <iostream>
size_t RunExact(uint32_t thresh, std::vector<uint32_t>& exact);
template<class T>
void generate_data(T* data,size_t number,uint32_t u32DomainSize,double dSkew);
int main(int argc, char **argv) {
size_t stNumberOfPackets = 10000000; // 样本数
double dPhi = 0.0001; //统计频率大于dPhi的元素,这里取万分之一
uint32_t u32DomainSize = 1048575; //样本取值范围
std::vector<uint32_t> exact(u32DomainSize + 1, 0);//精确统计,以便于做对比
//生成 Zipf 分布的数据
std::vector<uint32_t> data;
generate_data(&data,stNumberOfPackets,u32DomainSize,1.0);
//将测试数据分为20段运行 每运行一段 输出一次统计数据
size_t stRuns = 20;
size_t stRunSize = data.size() / stRuns;
size_t stStreamPos = 0;
LCL_type* lcl = LCL_Init(dPhi);
for (size_t run = 1; run <= stRuns; ++run) {
for (size_t i = stStreamPos; i < stStreamPos + stRunSize; ++i) {
exact[data[i]]+=1;
}
for (size_t i = stStreamPos; i < stStreamPos + stRunSize; ++i) {
LCL_Update(lcl,data[i],1);
}
uint32_t thresh = static_cast<uint32_t>(floor(dPhi * run * stRunSize));
if (thresh == 0) thresh = 1;
std::cout<<"Thresh is "<<thresh<<std::endl;
size_t hh = RunExact(thresh, exact);
std::cout << "Run: " << run << ", Exact: " << hh << std::endl;
std::map<uint32_t, uint32_t> res;
res = LCL_Output(lcl,thresh);
std::cout << "LCL: " << run << ", Count: " << res.size() << std::endl;
stStreamPos += stRunSize;
}
LCL_Destroy(lcl);
printf("\n");
return 0;
}
size_t RunExact(uint32_t thresh, std::vector<uint32_t>& exact)
{
size_t hh = 0;
for (size_t i = 0; i < exact.size(); ++i)
if (exact[i] >= thresh) ++hh;
return hh;
}
template<class T>
void generate_data(T* data,size_t number,uint32_t u32DomainSize,double dSkew){
prng_type * prng;
prng=prng_Init(44545,2);
int64_t a = (int64_t) (prng_int(prng)% MOD);
int64_t b = (int64_t) (prng_int(prng)% MOD);
prng_Destroy(prng);
Tools::Random r = Tools::Random(0xF4A54B);
Tools::PRGZipf zipf = Tools::PRGZipf(0, u32DomainSize, dSkew, &r);
size_t stCount = 0;
for (int i = 0; i < number; ++i)
{
++stCount;
if (stCount % 500000 == 0)
std::cout <<"Generate Data " << stCount << std::endl;
uint32_t v = zipf.nextLong();
uint32_t value = hash31(a, b, v) & u32DomainSize;
data->push_back(value);
}
}
--
弓长
孝文
、
王
http://zsp.iteye.com/
因为对比发现这个的综合性能比较好
xxx@ooo ~/zspal/zfeq/frequent-items/src $ cat Release/pyzlcl.py
#coding:utf-8
from pyzlcl import Lcl
#0.001 是要统计的频率下限
lcl = Lcl(0.001)
for i in xrange(200):
for j in xrange(100):
for k in xrange(j):
lcl.update(j, 1)
for i in xrange(1,100,30):
print i
print "出现的次数(估计值)",lcl.est(i)
print "estimate the worst case error in the estimate of a
particular item :" ,lcl.err(i)
print "---"*20
result = lcl.output(1000)
result.sort(key=lambda x:-x[1])
print result
print lcl.capacity()
xxx@ooo ~/zspal/zfeq/frequent-items/src $ python Release/pyzlcl.py
1
出现的次数(估计值) 200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
31
出现的次数(估计值) 6200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
61
出现的次数(估计值) 12200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
91
出现的次数(估计值) 18200
estimate the worst case error in the estimate of a particular item : 0
------------------------------------------------------------
[(99, 19800), (98, 19600), (97, 19400), (96, 19200), (95, 19000), (94,
18800), (93, 18600), (92, 18400), (91, 18200), (90, 18000), (89,
17800), (88, 17600), (87, 17400), (86, 17200), (85, 17000), (84,
16800), (83, 16600), (82, 16400), (81, 16200), (80, 16000), (79,
15800), (78, 15600), (77, 15400), (76, 15200), (75, 15000), (74,
14800), (73, 14600), (72, 14400), (71, 14200), (70, 14000), (69,
13800), (68, 13600), (67, 13400), (66, 13200), (65, 13000), (64,
12800), (63, 12600), (62, 12400), (61, 12200), (60, 12000), (59,
11800), (58, 11600), (57, 11400), (56, 11200), (55, 11000), (54,
10800), (53, 10600), (52, 10400), (51, 10200), (50, 10000), (49,
9800), (48, 9600), (47, 9400), (46, 9200), (45, 9000), (44, 8800),
(43, 8600), (42, 8400), (41, 8200), (40, 8000), (39, 7800), (38,
7600), (37, 7400), (36, 7200), (35, 7000), (34, 6800), (33, 6600),
(32, 6400), (31, 6200), (30, 6000), (29, 5800), (28, 5600), (27,
5400), (26, 5200), (25, 5000), (24, 4800), (23, 4600), (22, 4400),
(21, 4200), (20, 4000), (19, 3800), (18, 3600), (17, 3400), (16,
3200), (15, 3000), (14, 2800), (13, 2600), (12, 2400), (11, 2200),
(10, 2000), (9, 1800), (8, 1600), (7, 1400), (6, 1200), (5, 1000)]
44092
c中的用法演示
xxx@ooo ~/zspal/zfeq/frequent-items/src $ cat zlcl.cc
#include "prng.h"
#include "lossycount.h"
#include <iostream>
size_t RunExact(uint32_t thresh, std::vector<uint32_t>& exact);
template<class T>
void generate_data(T* data,size_t number,uint32_t u32DomainSize,double dSkew);
int main(int argc, char **argv) {
size_t stNumberOfPackets = 10000000; // 样本数
double dPhi = 0.0001; //统计频率大于dPhi的元素,这里取万分之一
uint32_t u32DomainSize = 1048575; //样本取值范围
std::vector<uint32_t> exact(u32DomainSize + 1, 0);//精确统计,以便于做对比
//生成 Zipf 分布的数据
std::vector<uint32_t> data;
generate_data(&data,stNumberOfPackets,u32DomainSize,1.0);
//将测试数据分为20段运行 每运行一段 输出一次统计数据
size_t stRuns = 20;
size_t stRunSize = data.size() / stRuns;
size_t stStreamPos = 0;
LCL_type* lcl = LCL_Init(dPhi);
for (size_t run = 1; run <= stRuns; ++run) {
for (size_t i = stStreamPos; i < stStreamPos + stRunSize; ++i) {
exact[data[i]]+=1;
}
for (size_t i = stStreamPos; i < stStreamPos + stRunSize; ++i) {
LCL_Update(lcl,data[i],1);
}
uint32_t thresh = static_cast<uint32_t>(floor(dPhi * run * stRunSize));
if (thresh == 0) thresh = 1;
std::cout<<"Thresh is "<<thresh<<std::endl;
size_t hh = RunExact(thresh, exact);
std::cout << "Run: " << run << ", Exact: " << hh << std::endl;
std::map<uint32_t, uint32_t> res;
res = LCL_Output(lcl,thresh);
std::cout << "LCL: " << run << ", Count: " << res.size() << std::endl;
stStreamPos += stRunSize;
}
LCL_Destroy(lcl);
printf("\n");
return 0;
}
size_t RunExact(uint32_t thresh, std::vector<uint32_t>& exact)
{
size_t hh = 0;
for (size_t i = 0; i < exact.size(); ++i)
if (exact[i] >= thresh) ++hh;
return hh;
}
template<class T>
void generate_data(T* data,size_t number,uint32_t u32DomainSize,double dSkew){
prng_type * prng;
prng=prng_Init(44545,2);
int64_t a = (int64_t) (prng_int(prng)% MOD);
int64_t b = (int64_t) (prng_int(prng)% MOD);
prng_Destroy(prng);
Tools::Random r = Tools::Random(0xF4A54B);
Tools::PRGZipf zipf = Tools::PRGZipf(0, u32DomainSize, dSkew, &r);
size_t stCount = 0;
for (int i = 0; i < number; ++i)
{
++stCount;
if (stCount % 500000 == 0)
std::cout <<"Generate Data " << stCount << std::endl;
uint32_t v = zipf.nextLong();
uint32_t value = hash31(a, b, v) & u32DomainSize;
data->push_back(value);
}
}
--
弓长
孝文
、
王
http://zsp.iteye.com/
- frequent-items.7z.zip (172.2 KB)
- 下载次数: 21
- AE._Efficient_computation_of_frequent_and__top-k_elements_in_data_streams..pdf.zip (193.6 KB)
- 下载次数: 20
- 频繁集合算法一览.zip (163 KB)
- 下载次数: 29
- 频繁统计图解.zip (337.5 KB)
- 下载次数: 14
发表评论
-
关于"Google限制Python"事件我的看法
2009-11-17 15:11 8277本来做一个勤勤恳恳的 ... -
python排版工具
2009-10-15 14:22 3440http://pypi.python.org/pypi/pyt ... -
Fast Asynchronous Python Web Server (Fapws is short)
2009-08-15 12:12 1815http://github.com/william-os4y/ ... -
python奇技淫巧
2009-07-23 22:27 2454http://wiki.python.org/moin/By ... -
跨平台 获取系统信息的python库 http://support.hyperic.com/disp
2009-06-12 11:49 3591The Sigar API provides a portab ... -
libsvm (python封装) 学习笔记 1
2009-05-19 14:28 41892009-05-19 14:10:38 #!/usr/bin ... -
lcs.py 最长公共子串算法
2009-05-05 15:50 2926感觉用来匹配相似文件比最短编辑距离更靠谱,最短编辑应该是用来纠 ... -
lrucache.py 最近最少使用算法
2009-05-04 13:23 2871lrucache.py 最近最少使用算法 2009-05-04 ... -
史上最快 异步消息队列zeromq 简介
2009-04-30 21:40 27220是的,我喜欢Z开头的东西. http://www.zer ... -
相似单词
2009-03-18 00:54 1735给你一个单词a,如果通过交换单词中字母的顺序可以得到另外的单词 ... -
is_cn_char
2009-03-14 13:39 1299unicode码 def is_cn_char(i): ... -
写一个python的urldecode
2009-03-03 10:57 5077from urllib import unquote def ... -
今天才发现python的sort有个key参数,我好圡...
2009-02-28 20:59 3049>>> a=range(10) >& ... -
发一个山寨版python的Orm
2009-02-24 23:49 2208发一个山寨版的Orm 大概用法见 http://docs. ... -
pyrex学习笔记
2009-02-24 03:36 16770. easy_install pyrex 1.写pyrex ... -
python的一个有趣的细节
2009-02-24 02:00 1328python3.0一个有趣的细节 2009-02-24 01: ... -
python备玩候选者
2009-02-24 00:34 1667* 张沈鹏 推荐网址当然要有一个部署的东西 Exs ... -
python读取mp3 ID3信息
2009-02-18 16:57 2591pyid3不好用,常常有不认识的. mutagen不错,不过 ... -
又写了一个python的route模块
2009-01-14 01:18 2070是的 我很无聊 -
mxTidy - HTML Tidy for Python
2009-01-05 10:50 5187抓取的html不处理一下很容易破坏页面的布局 官网的py ...
相关推荐
FP-growth发现频繁项集python实现(含数据集),结构清晰易懂
MQTT客户端(python封装的类),类的方法包括连接、订阅和发布。
本代码主要利用Python工具实现FP-growth高效发现频繁项集,简单明了,易于理解
由Python实现的频繁项集挖掘Apriori算法 频繁项集用keys表示, key表示项集中的某一项, cutKeys表示经过剪枝步的某k项集。 C表示某k项集的每一项在事务数据库D中的支持计数。 '''频繁项集用keys表示, key表示项...
本文实例讲述了Python操作Oracle数据库的简单方法和封装类。分享给大家供大家参考,具体如下: 最近工作有接触到Oracle,发现很多地方用Python脚本去做的话,应该会方便很多,所以就想先学习下Python操作Oracle的...
封装是隐藏对象中一些不希望被外部所访问到的属性或方法,学会使用getter和setter()方法,get_属性名,set_属性名,
Python 访问 Sqlite 封装 实体,数据返回实体集合 Sqlite_DbHelper DBSupport 类似C# 反射生成 sql脚本
swig的学习以及国密的python封装
bluepy 一款python封装的BLE利器
python dbutils 简单封装
google推出了一版word2vec,词的向量化技术极大改变了文本分析的传统方法。这种是深度学习的方法,在nlp领域的全新应用。...现在提供python封装版本,方便nlp的工程师们在自己熟悉的python领域进行应用。
python中Log封装,可直接使用
对大漠插件的python封装,用于windows环境鼠标、键盘操作,图像比较、文字识别等,基本以直接调用大漠接口为主。需要用到大漠插件及大漠综合工具 大漠插件调用库 MoveClick 移动并左键单击 SayString 发送文本,有x,...
用python 来实现和了解生物同统计学。
主要介绍了Python实现对adb命令封装,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
PowerShell封装了Python for .NET实现从PowerShell调用Python
modbus的python封装包MinimalModbus,是window客户端,很好用
python封装继承多态笔记,python封装继承多态笔记,python封装继承多态笔记适合初学者学习!
Python 小说词频统计 Python源码Python 小说词频统计 Python源码Python 小说词频统计 Python源码Python 小说词频统计 Python源码Python 小说词频统计 Python源码Python 小说词频统计 Python源码Python 小说词频统计 ...