- 浏览: 120864 次
- 性别:
- 来自: 杭州
文章分类
最新评论
import org.apache.spark.ml.{Pipeline, PipelineModel}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
// Prepare training documents from a list of (id, text, label) tuples.
val training = spark.createDataFrame(Seq(
(0L, "a b c d e spark", 1.0),
(1L, "b d", 0.0),
(2L, "spark f g h", 1.0),
(3L, "hadoop mapreduce", 0.0)
)).toDF("id", "text", "label")
// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
val tokenizer = new Tokenizer()
.setInputCol("text")
.setOutputCol("words")
val hashingTF = new HashingTF()
.setNumFeatures(1000)
.setInputCol(tokenizer.getOutputCol)
.setOutputCol("features")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.001)
val pipeline = new Pipeline()
.setStages(Array(tokenizer, hashingTF, lr))
// Fit the pipeline to training documents.
val model = pipeline.fit(training)
// Now we can optionally save the fitted pipeline to disk
model.write.overwrite().save("/tmp/spark-logistic-regression-model")
// We can also save this unfit pipeline to disk
pipeline.write.overwrite().save("/tmp/unfit-lr-model")
// And load it back in during production
val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model")
// Prepare test documents, which are unlabeled (id, text) tuples.
val test = spark.createDataFrame(Seq(
(4L, "spark i j k"),
(5L, "l m n"),
(6L, "spark hadoop spark"),
(7L, "apache hadoop")
)).toDF("id", "text")
// Make predictions on test documents.
model.transform(test)
.select("id", "text", "probability", "prediction")
.collect()
.foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>
println(s"($id, $text) --> prob=$prob, prediction=$prediction")
}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
// Prepare training documents from a list of (id, text, label) tuples.
val training = spark.createDataFrame(Seq(
(0L, "a b c d e spark", 1.0),
(1L, "b d", 0.0),
(2L, "spark f g h", 1.0),
(3L, "hadoop mapreduce", 0.0)
)).toDF("id", "text", "label")
// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
val tokenizer = new Tokenizer()
.setInputCol("text")
.setOutputCol("words")
val hashingTF = new HashingTF()
.setNumFeatures(1000)
.setInputCol(tokenizer.getOutputCol)
.setOutputCol("features")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.001)
val pipeline = new Pipeline()
.setStages(Array(tokenizer, hashingTF, lr))
// Fit the pipeline to training documents.
val model = pipeline.fit(training)
// Now we can optionally save the fitted pipeline to disk
model.write.overwrite().save("/tmp/spark-logistic-regression-model")
// We can also save this unfit pipeline to disk
pipeline.write.overwrite().save("/tmp/unfit-lr-model")
// And load it back in during production
val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model")
// Prepare test documents, which are unlabeled (id, text) tuples.
val test = spark.createDataFrame(Seq(
(4L, "spark i j k"),
(5L, "l m n"),
(6L, "spark hadoop spark"),
(7L, "apache hadoop")
)).toDF("id", "text")
// Make predictions on test documents.
model.transform(test)
.select("id", "text", "probability", "prediction")
.collect()
.foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>
println(s"($id, $text) --> prob=$prob, prediction=$prediction")
}
发表评论
-
Spark SQL运行 过程 抄的别人的,记录 学习
2018-05-13 23:07 1013抄的别人的,觉得写的特别好 val FILESOURCE ... -
thriftserver log4j.properties 生效
2018-04-09 11:46 425/home/isuhadoop/spark2/sbin/sta ... -
udaf 返回的 子属性
2018-03-20 13:22 421udaf 返回的 子属性 spark.sql(" ... -
spark datasource
2018-03-16 16:36 641DataFrameWriter format val c ... -
如何 map 端 Join。
2018-03-04 19:31 599Hive 中 修改表的 rawDataSize = 1 1 ... -
spark thrift server 修改
2018-03-04 12:58 558org.apache.spark.sql.hive.thrif ... -
hive hbase thriftserver run
2018-03-03 15:13 381正确方法 : 0\ 拷贝对应目录到 spark2 jars ... -
scala package
2018-01-25 09:48 493#scala 打包 mvn clean scala:com ... -
SPARK SERVER
2018-01-23 22:15 504sbin/start-thriftserver.sh --dr ... -
driver class
2018-01-21 22:11 504sbin/start-thriftserver.sh -- ... -
spark thrift server 调试
2017-10-20 15:50 837spark-hive-thriftserver 本地调试 ... -
spark SQL conf
2017-10-18 14:36 592org.apache.spark.sql.internal.S ... -
java 死锁 ,内存问题 分析
2017-10-17 10:50 318jstack -l pid /opt/soft/jdk/ ... -
thriftServer proxy
2017-10-16 14:21 906sudo yum install haproxy 257 ... -
hive spark conf
2017-09-26 17:44 1279CREATE TABLE org_userbehavior_a ... -
get day
2017-09-19 08:41 283def timeDayNow() = { var ... -
thriftserver
2017-09-14 19:47 434export SPARK_CONF_DIR=/home/yun ... -
thriftserver dynamicallocation
2017-09-08 14:41 556./sbin/start-thriftserver.sh -- ... -
test code2
2017-09-03 13:45 466package org.test.udf import co ... -
test code
2017-08-24 17:52 261def taskcal(data:Array[(String, ...
相关推荐
基于acess_token和refresh_token实现token续签
APP使用token和refreshToken实现接口身份认证,保持登录状态
微信小程序url与token设置详解 新浪云应用sae的代码里创建一个weixin.php文件,写入以下代码 define(TOKEN,myToken);// 后台填写的token,在微信公众平台启用 $wechatObj = new wechatAPI(); $wechatObj->isValid()...
SpringSecurity-JWT-VERSION2(AccessToken和RefreshToken) version1太复杂,无法优化。accessToken refreshToken流安全登录处理流程详细说明转到博客文章JWT异常处理安全异常处理(AuthenticationEntryPoint,...
通过ajax分配相应的clientID和Secret及用户名和...测试页面click_me_please_iframe.html包含相应的刷新和认证,同时refresh_token以文件的形式进行存储,方便下次程序直接使用,不必要在产生新的token;开发工具是vs2017
OneNET-token计算工具 用于生成连接OneNET平台时的token值。便于大家下载使用
JWT Token生成及验证:JSON WEB TOKEN,简单谈谈TOKEN的使用及在C#中的实现
OneNet一键token工具
小程序登录开发通常是调用wx.login获取code,然后发送到后台,后台请求微信拿到用户openId,然后根据openId查询用户,有就走登录流程然后返回token,没有则创建用户之后走登录流程然后返回token,也就是都需要返回一...
kindeditor本地上传图片支持token字段,kindeditor本身图片上传并无token参数,本资源利用原生ajax改造上传代码。
onenet MQTT Token计算工具
JWT格式的Token动态库封装,包括获取token,验证token,获取token中保存的内容,验证了Token是否正确,验证了Token的ip是否相同,验证了Token的过期时间
这是一个token的示例,众所周知,token是用于后台服务器认证浏览器的一种技术,它弥补了cookie对数据大小限制和安全性问题
代码层面详解eosio.token合约,并对用到的multi-index数据库首先进行了解读。token合约的中用到的各种数据结构,以及create、issue、transfer、sub_balance、add_balance都有详细解读。
mapbox-gl 2.7.0 去token
这是一款用于谷歌浏览器端的插件,如果您公司项目使用的是前后端分离技术,例如前端使用的是VUE,后端开发人员就可以安装该插件,写入相应的登录规则后即可在浏览器动态获取token,强烈推荐各位使用!
token刷新token刷新token刷新
微信登录换取token的流程 如何将code变成openid和session_key 抛出错误异常和派发令牌 一:微信登录换取token的流程 多说无益,直接上图 小程序获取token.png 这里介绍的主要是后端开发的流程,前端的不是本文...
JWT是json web token缩写。它将用户信息加密到token里,服务器不保存任何用户信息。服务器通过使用保存的密钥验证token的正确性,只要正确即通过验证。基于token的身份验证可以替代传统的cookie+session身份验证方法...