`
yangyi
  • 浏览: 112603 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论
文章列表
Mesa的数据组织单元叫做表. 每张表都有独立的模式信息描述表的组织结构. 主要包括K集合和V集合,即键和值区间 , 其他模型表述还包括聚合函数F,其值集合由集合V聚合得到,即:F : V × V. 聚合函数满足结合律,即 F(F(v0, v1), v2) = F(v0, F(v1, v2). 此外,模式中还包括对于表中K结合的索引.   K和V的集合以列和元组的形式表述, 每一列都有固定的取值类型 (比如 int32,int64, string). 模式给每一个单独的V集合列定义一个聚合函数, 这样根据在两个元组之间(两行数据),可以获得公式如下:   F((x1, . . . , ...
Storm绝对是Hadoop的进化。是有生命力的系统。批处理本身的目的就是聚合和关系优化,实时系统和批处理系统的结合方式是两者最大的区别  
相似度分析: 欧式距离(坐标距离) 皮尔逊相关系数 非监督聚合: K-均值聚合(随机选聚合数量值,就近聚合) 分类聚合(两两聚合形成树) 搜索引擎: 爬虫(URL追踪,去环) 分词(最小词串法,统计语言模型法-相近词条件概率计算乘积) 索引/检索(Big Table/Nosql DB) 排序(词频,词距,PageRank,点击率-样本学习,首段出现位置,URL,Meta) 最优化算法: 多变量(不同布局)求低cost  随机大量解选优 爬山调优 模拟退火算法(多变量上下范围调优) 遗传算法(随机起点,精英遴选,变异/杂交)  文本过滤: ...

Predicted transactions

When you put something into a dynamic array, it won't grow by just 1 when reaching the limit.  This is important because changing size is an expensive operation.   For example if I have a job to execute, and the input data just keep coming and coming every now and then.   There could be 2 models ...
When talking about Big data, most people think of data in the volume perspective, which is essentially true, but there are also other dimensions such as data variaty and data relationships.   Just like other alternative solutions in the internet industry, CURD is the basic operations for data, th ...
Nathan Marz的书还是要看一看的,毕竟那个墙外的网站经受住了美国大选结果出炉的考验 为什么要讨论大数据设计 电子数据已成为每天生活的一部分,不论是网络上,还是科学实验的记录上 数据的增长影响到业务的开展,引发了传统关系型数据库的性能瓶颈,传统的设计管理方式很难扩展到大数据上 为解决大数据带来的问题,不同的软件开发领域以NoSQL的名义引入了很多新的技术,这些新技术可以在某种程度上支撑数据的扩展,但是没有一个一致的设计方式,即需要根据具体的业务和技术架构灵活处理 这些技术的领导者包括:Google,Amazon和开源社区。技术包括:分布式文件系统,分布式计算框架,分布式锁框架, ...
1) Open jvisualvm, which is bundled with the standard javase 6 release. It is placed under $JAVA_HOME/bin, implemented with the Netbeans framework. 2) Click on the samples tag. Here we have 2 factors for profiling, CPU and Memory. 3) Click Settings, choose "include only packages" and setup ...

Powerful LDAP client

Here is a brief introduction of the LDAP protocol, which is used extensively in large organizations: http://quark.humbug.org.au/publications/ldap/ldap_tut.html Before integration programs with these LDAP servers, here is a fantastic LDAP testing client named JXPlorer: http://jxplorer.org/downloads/u ...
We are a global team with around 30 devs that generally focusing on the same product. And we have a mirror site of maven repo maintained using Nexus. The proxy server is placed in the US and when we start checking out the latest code and rebuild, it usually takes the Beijing team a lot of time. The p ...
1) Check out the OS version and architecture, such as: bash-3.00# cat /etc/release Solaris 10 5/08 s10s_u5wos_10 SPARC bash-3.00# isainfo -b 64 2) Although it's not hard to build code from scratch, it's much easier to download the pre-build version. Say for this SunOS, the URL would be: http://wwwm ...

static import

    博客分类:
  • Java
Static import支持略去类型名的public static字段使用 如: import static java.lang.Math.PI; import static java.lang.Math.pow; 或全部: import static java.lang.Math.*; 例子: public class HelloWorld { public static void main(String[] args) { System.out.println("Hello World!"); System.out.pr ...
何为Entity Java EE规范中关于Entity的第一句就是An entity is a lightweight persistence domain object entity应该和E-R图中的E是一个意思就是实体,接下来是“轻量级的持久化domain object”, 啥是domain object? 按维基上的解释就是一个多层的业务系 ...
题目:有一个南北向的桥,只能容纳一个人,现桥的两边分别有10人和12人,编制一个多线程序让这些人到达对岸,每个人用一个线程表示,桥为共享资源。在过桥的过程中显示谁在过桥及其走向。 import threading import time from collections import deque class Person(threading.Thread): def __init__(self, id, msg): threading.Thread.__init__(self) self.id = id self.ms ...
有1000个鸡蛋,10个筐,将鸡蛋放入这样10个筐中,给定一个任意数N<1000,问如何放置,能使我们快速的取出相应数目的鸡蛋? r = int(input("Input a number:")) n = 1000 i = 1 s = [] while (i - 1) < n: s.append(i) i <<= 1 rest = n - s.pop() + 1 useRest = False if r >= rest: r = r - rest useRest = ...
commons-lang里StringUtils的实现 int i = 0; int j = array.length - 1; while (j > i) { swap(array, i, j); j--; i++; } JDK中AbstractStringBuilder的实现 int n = array.length - 1; for (int k = (n-1) >> 1; k >= 0; -- ...
Global site tag (gtag.js) - Google Analytics