- 浏览: 156263 次
- 性别:
- 来自: 徐州
文章分类
- 全部博客 (94)
- 电脑小技巧 (6)
- 工具 (5)
- html javascript ajax (14)
- 面试题目 (1)
- 数据库 (13)
- linux (3)
- java (21)
- spring 中的bean 大小写 (1)
- lucene compass (1)
- windows7安装 (2)
- 使用WinPE安装Windows 7 (1)
- delphi (1)
- spring3 mvc ,aop ,spring security (0)
- spring3 mvc (3)
- aop (3)
- spring security (2)
- InetAddress (1)
- ipv6 (1)
- video 视频 (0)
- 地址 (1)
- collabnet (1)
- spring (1)
- phonegap (1)
- JRebel (1)
- ionic (1)
- scriptx (1)
- flex (1)
最新评论
-
snihcel:
expression="execution(* or ...
spring3 mvc 添加aop支持(Spring MVC 注解下Controller 的AOP) -
xuxiangpan888:
weaponhuang 写道确定AOP能对conttoler进 ...
spring3 mvc 添加aop支持(Spring MVC 注解下Controller 的AOP) -
weaponhuang:
确定AOP能对conttoler进行拦截????这个我搞了好几 ...
spring3 mvc 添加aop支持(Spring MVC 注解下Controller 的AOP) -
huangyunbin:
你在你的项目中试验过吗?我在我的项目中试了下,不行啊。然后我是 ...
spring3 mvc 添加aop支持(Spring MVC 注解下Controller 的AOP) -
梅花簪:
方法斯蒂芬
spring3 mvc 添加aop支持(Spring MVC 注解下Controller 的AOP)
XPDF使用文档
XPDF版本 3.0.2
日期 2008-11-26
文档版本 V1.0
1、概述
读取PDF文件中的文本内容,可以使用开源项目xpdf。下载地址:http://www.foolabs.com/xpdf/download.html。
注意使用:xpdf-3.02pl2-win32.zip以及xpdf-chinese-simplified.tar.gz(支持中文)。
2、安装
将xpdf-3.02pl2-win32.zip解压缩到D盘xpdf目录下,我们将以d:\xpdf作为xpdf的工作路径。
将xpdf-chinese-simplified.tar解压缩到xpdf根目录下的xpdf-chinese-simplified目录中。
为了启用中文简体语言包,您必须将xpdf目录下的sample-xpdfrc文件另存为xpdfrc文件。
注意:此文件为配置文件,而且名称必须是xpdfrc。如果是别的名字,即使调用pdftotext.exe时,传入”-cfg xpdfrc2”来告诉xpdf配置文件的名字,好像pdftotext.exe也并没有使用这个配置文件。所以为了减少误解,请您将配置文件直接命名为xpdfrc。
并在这个xpdfrc文件最后加上以下配置,注意Map文件的路径一定要正确。
Java代码
#----- begin Chinese Simplified support package (2004-jul-27)
cidToUnicode Adobe-GB1 D:/xpdf/ xpdf-chinese-simplified/Adobe-GB1.cidToUnicode
unicodeMap ISO-2022-CN D:/xpdf/ xpdf-chinese-simplified/ISO-2022-CN.unicodeMap
unicodeMap EUC-CN D:/xpdf/xpdf-chinese-simplified/EUC-CN.unicodeMap
unicodeMap GBK D:/xpdf/xpdf-chinese-simplified/GBK.unicodeMap
cMapDir Adobe-GB1 D:/xpdf/xpdf-chinese-simplified/Cmap
toUnicodeDir D:/xpdf/xpdf-chinese-simplified/Cmap
#displayCIDFontTT Adobe-GB1 /usr/..../gkai00mp.ttf
#----- end Chinese Simplified support package
#----- begin Chinese Simplified support package (2004-jul-27)
cidToUnicode Adobe-GB1 D:/xpdf/ xpdf-chinese-simplified/Adobe-GB1.cidToUnicode
unicodeMap ISO-2022-CN D:/xpdf/ xpdf-chinese-simplified/ISO-2022-CN.unicodeMap
unicodeMap EUC-CN D:/xpdf/xpdf-chinese-simplified/EUC-CN.unicodeMap
unicodeMap GBK D:/xpdf/xpdf-chinese-simplified/GBK.unicodeMap
cMapDir Adobe-GB1 D:/xpdf/xpdf-chinese-simplified/Cmap
toUnicodeDir D:/xpdf/xpdf-chinese-simplified/Cmap
#displayCIDFontTT Adobe-GB1 /usr/..../gkai00mp.ttf
#----- end Chinese Simplified support package另外,配置文件中原先没有加上一个“textPageBreaks”控制。为了避免这个分页符号,我们需要在xpdfrc文件“text output control”下面加上这么一段话:
Java代码
# If set to "yes", text extraction will insert page
# breaks (form feed characters) between pages. This
# defaults to "yes".
textPageBreaks no
# If set to "yes", text extraction will insert page
# breaks (form feed characters) between pages. This
# defaults to "yes".
textPageBreaks no
设置textPageBreaks为no的意思是:在PDF文档的两页之间不加入分页符号。之所以这样,是因为这个符号有时候会引起SAX解析XML上的困难。
配置文件中原先把textEncoding注释了。这样默认的字符集是Latin1。我们必须打开它
Java代码
#textEncoding UTF-8
textEncoding GBK
#textEncoding UTF-8
textEncoding GBK
3、命令行调用
D:\xpdf\xpdf-3.02pl2-win32>pdftotext.exe -cfg xpdfrc d:\dwr中文文档(pdf).pdf
4、JAVA调用示范
pdftotext.exe的运行参数中,
Java代码
private String excuteStr = "D:\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe";
public String getContent() {
String[] cmd = new String[] { excuteStr, "-enc", "UTF-8", "-q", file.getAbsolutePath(),"-" };
Process p = null;
BufferedInputStream bis = null ;
InputStreamReader reader = null;
StringBuffer sb = null;
BufferedReader br = null;
try {
p = Runtime.getRuntime().exec(cmd);
bis = new BufferedInputStream(p.getInputStream());
reader = new InputStreamReader(bis, "UTF-8");
sb = new StringBuffer();
br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();
while (line != null) {
System.out.println(line);
sb.append(line);
sb.append(" ");
line = br.readLine();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
content = sb.toString() ;
return content ;
}
private String excuteStr = "D:\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe";
public String getContent() {
String[] cmd = new String[] { excuteStr, "-enc", "UTF-8", "-q", file.getAbsolutePath(),"-" };
Process p = null;
BufferedInputStream bis = null ;
InputStreamReader reader = null;
StringBuffer sb = null;
BufferedReader br = null;
try {
p = Runtime.getRuntime().exec(cmd);
bis = new BufferedInputStream(p.getInputStream());
reader = new InputStreamReader(bis, "UTF-8");
sb = new StringBuffer();
br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();
while (line != null) {
System.out.println(line);
sb.append(line);
sb.append(" ");
line = br.readLine();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
content = sb.toString() ;
return content ;
}
一个应用的demo
Java代码
package com.cs;
public interface Parsable {
public String getTitle() ;
public String getContent() ;
public String getSummary() ;
}
package com.cs;
public interface Parsable {
public String getTitle() ;
public String getContent() ;
public String getSummary() ;
}
Java代码
package com.cs;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
public class PdfParser implements Parsable {
private File file ;
private String content ;//内容
/*
* pdf解读需配置
*/
private String executeStr = "E:\\EclipseStudyWorkspace\\LuceneParse\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe" ;
public PdfParser(File file){
this.file = file ;
}
public String getContent(){
if (content != null){
return content ;
}
String[] cmd = new String[]{executeStr,"-enc","UTF-8","-q",file.getAbsolutePath(),"-"} ;
Process p = null ;
BufferedReader br = null ;
StringBuffer sb = new StringBuffer() ;
try {
p = Runtime.getRuntime().exec(cmd) ;
br = new BufferedReader(new InputStreamReader(p.getInputStream(),"UTF-8")) ;
String str = null ;
while((str = br.readLine() ) != null ){
sb.append(str).append("\n") ;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally{
if (br != null){
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
content = sb.toString() ;
return content ;
}
public String getSummary() {
String summary ;
if (content == null ) {
getContent() ;
}
if (content.length() > 200) {
summary = content.substring(0, 200) ;
}else {
summary = content ;
}
return summary;
}
public String getTitle(){
return file.getName() ;
}
public static void main(String[] args){
PdfParser parser = new PdfParser(new File("E:\\EclipseStudyWorkspace\\LuceneParse\\fileSource\\123.pdf")) ;
System.out.println("pdf content : "+parser.getContent()) ;
}
}
package com.cs;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
public class PdfParser implements Parsable {
private File file ;
private String content ;//内容
/*
* pdf解读需配置
*/
private String executeStr = "E:\\EclipseStudyWorkspace\\LuceneParse\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe" ;
public PdfParser(File file){
this.file = file ;
}
public String getContent(){
if (content != null){
return content ;
}
String[] cmd = new String[]{executeStr,"-enc","UTF-8","-q",file.getAbsolutePath(),"-"} ;
Process p = null ;
BufferedReader br = null ;
StringBuffer sb = new StringBuffer() ;
try {
p = Runtime.getRuntime().exec(cmd) ;
br = new BufferedReader(new InputStreamReader(p.getInputStream(),"UTF-8")) ;
String str = null ;
while((str = br.readLine() ) != null ){
sb.append(str).append("\n") ;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally{
if (br != null){
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
content = sb.toString() ;
return content ;
}
public String getSummary() {
String summary ;
if (content == null ) {
getContent() ;
}
if (content.length() > 200) {
summary = content.substring(0, 200) ;
}else {
summary = content ;
}
return summary;
}
public String getTitle(){
return file.getName() ;
}
public static void main(String[] args){
PdfParser parser = new PdfParser(new File("E:\\EclipseStudyWorkspace\\LuceneParse\\fileSource\\123.pdf")) ;
System.out.println("pdf content : "+parser.getContent()) ;
}
}
项目的结构
-projectName
------------src
------------webroot
------------xpdf
----xpdf-3.02pl2-win32
----xpdf-chinese-simplified
XPDF版本 3.0.2
日期 2008-11-26
文档版本 V1.0
1、概述
读取PDF文件中的文本内容,可以使用开源项目xpdf。下载地址:http://www.foolabs.com/xpdf/download.html。
注意使用:xpdf-3.02pl2-win32.zip以及xpdf-chinese-simplified.tar.gz(支持中文)。
2、安装
将xpdf-3.02pl2-win32.zip解压缩到D盘xpdf目录下,我们将以d:\xpdf作为xpdf的工作路径。
将xpdf-chinese-simplified.tar解压缩到xpdf根目录下的xpdf-chinese-simplified目录中。
为了启用中文简体语言包,您必须将xpdf目录下的sample-xpdfrc文件另存为xpdfrc文件。
注意:此文件为配置文件,而且名称必须是xpdfrc。如果是别的名字,即使调用pdftotext.exe时,传入”-cfg xpdfrc2”来告诉xpdf配置文件的名字,好像pdftotext.exe也并没有使用这个配置文件。所以为了减少误解,请您将配置文件直接命名为xpdfrc。
并在这个xpdfrc文件最后加上以下配置,注意Map文件的路径一定要正确。
Java代码
#----- begin Chinese Simplified support package (2004-jul-27)
cidToUnicode Adobe-GB1 D:/xpdf/ xpdf-chinese-simplified/Adobe-GB1.cidToUnicode
unicodeMap ISO-2022-CN D:/xpdf/ xpdf-chinese-simplified/ISO-2022-CN.unicodeMap
unicodeMap EUC-CN D:/xpdf/xpdf-chinese-simplified/EUC-CN.unicodeMap
unicodeMap GBK D:/xpdf/xpdf-chinese-simplified/GBK.unicodeMap
cMapDir Adobe-GB1 D:/xpdf/xpdf-chinese-simplified/Cmap
toUnicodeDir D:/xpdf/xpdf-chinese-simplified/Cmap
#displayCIDFontTT Adobe-GB1 /usr/..../gkai00mp.ttf
#----- end Chinese Simplified support package
#----- begin Chinese Simplified support package (2004-jul-27)
cidToUnicode Adobe-GB1 D:/xpdf/ xpdf-chinese-simplified/Adobe-GB1.cidToUnicode
unicodeMap ISO-2022-CN D:/xpdf/ xpdf-chinese-simplified/ISO-2022-CN.unicodeMap
unicodeMap EUC-CN D:/xpdf/xpdf-chinese-simplified/EUC-CN.unicodeMap
unicodeMap GBK D:/xpdf/xpdf-chinese-simplified/GBK.unicodeMap
cMapDir Adobe-GB1 D:/xpdf/xpdf-chinese-simplified/Cmap
toUnicodeDir D:/xpdf/xpdf-chinese-simplified/Cmap
#displayCIDFontTT Adobe-GB1 /usr/..../gkai00mp.ttf
#----- end Chinese Simplified support package另外,配置文件中原先没有加上一个“textPageBreaks”控制。为了避免这个分页符号,我们需要在xpdfrc文件“text output control”下面加上这么一段话:
Java代码
# If set to "yes", text extraction will insert page
# breaks (form feed characters) between pages. This
# defaults to "yes".
textPageBreaks no
# If set to "yes", text extraction will insert page
# breaks (form feed characters) between pages. This
# defaults to "yes".
textPageBreaks no
设置textPageBreaks为no的意思是:在PDF文档的两页之间不加入分页符号。之所以这样,是因为这个符号有时候会引起SAX解析XML上的困难。
配置文件中原先把textEncoding注释了。这样默认的字符集是Latin1。我们必须打开它
Java代码
#textEncoding UTF-8
textEncoding GBK
#textEncoding UTF-8
textEncoding GBK
3、命令行调用
D:\xpdf\xpdf-3.02pl2-win32>pdftotext.exe -cfg xpdfrc d:\dwr中文文档(pdf).pdf
4、JAVA调用示范
pdftotext.exe的运行参数中,
Java代码
private String excuteStr = "D:\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe";
public String getContent() {
String[] cmd = new String[] { excuteStr, "-enc", "UTF-8", "-q", file.getAbsolutePath(),"-" };
Process p = null;
BufferedInputStream bis = null ;
InputStreamReader reader = null;
StringBuffer sb = null;
BufferedReader br = null;
try {
p = Runtime.getRuntime().exec(cmd);
bis = new BufferedInputStream(p.getInputStream());
reader = new InputStreamReader(bis, "UTF-8");
sb = new StringBuffer();
br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();
while (line != null) {
System.out.println(line);
sb.append(line);
sb.append(" ");
line = br.readLine();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
content = sb.toString() ;
return content ;
}
private String excuteStr = "D:\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe";
public String getContent() {
String[] cmd = new String[] { excuteStr, "-enc", "UTF-8", "-q", file.getAbsolutePath(),"-" };
Process p = null;
BufferedInputStream bis = null ;
InputStreamReader reader = null;
StringBuffer sb = null;
BufferedReader br = null;
try {
p = Runtime.getRuntime().exec(cmd);
bis = new BufferedInputStream(p.getInputStream());
reader = new InputStreamReader(bis, "UTF-8");
sb = new StringBuffer();
br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();
while (line != null) {
System.out.println(line);
sb.append(line);
sb.append(" ");
line = br.readLine();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
content = sb.toString() ;
return content ;
}
一个应用的demo
Java代码
package com.cs;
public interface Parsable {
public String getTitle() ;
public String getContent() ;
public String getSummary() ;
}
package com.cs;
public interface Parsable {
public String getTitle() ;
public String getContent() ;
public String getSummary() ;
}
Java代码
package com.cs;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
public class PdfParser implements Parsable {
private File file ;
private String content ;//内容
/*
* pdf解读需配置
*/
private String executeStr = "E:\\EclipseStudyWorkspace\\LuceneParse\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe" ;
public PdfParser(File file){
this.file = file ;
}
public String getContent(){
if (content != null){
return content ;
}
String[] cmd = new String[]{executeStr,"-enc","UTF-8","-q",file.getAbsolutePath(),"-"} ;
Process p = null ;
BufferedReader br = null ;
StringBuffer sb = new StringBuffer() ;
try {
p = Runtime.getRuntime().exec(cmd) ;
br = new BufferedReader(new InputStreamReader(p.getInputStream(),"UTF-8")) ;
String str = null ;
while((str = br.readLine() ) != null ){
sb.append(str).append("\n") ;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally{
if (br != null){
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
content = sb.toString() ;
return content ;
}
public String getSummary() {
String summary ;
if (content == null ) {
getContent() ;
}
if (content.length() > 200) {
summary = content.substring(0, 200) ;
}else {
summary = content ;
}
return summary;
}
public String getTitle(){
return file.getName() ;
}
public static void main(String[] args){
PdfParser parser = new PdfParser(new File("E:\\EclipseStudyWorkspace\\LuceneParse\\fileSource\\123.pdf")) ;
System.out.println("pdf content : "+parser.getContent()) ;
}
}
package com.cs;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
public class PdfParser implements Parsable {
private File file ;
private String content ;//内容
/*
* pdf解读需配置
*/
private String executeStr = "E:\\EclipseStudyWorkspace\\LuceneParse\\xpdf\\xpdf-3.02pl2-win32\\pdftotext.exe" ;
public PdfParser(File file){
this.file = file ;
}
public String getContent(){
if (content != null){
return content ;
}
String[] cmd = new String[]{executeStr,"-enc","UTF-8","-q",file.getAbsolutePath(),"-"} ;
Process p = null ;
BufferedReader br = null ;
StringBuffer sb = new StringBuffer() ;
try {
p = Runtime.getRuntime().exec(cmd) ;
br = new BufferedReader(new InputStreamReader(p.getInputStream(),"UTF-8")) ;
String str = null ;
while((str = br.readLine() ) != null ){
sb.append(str).append("\n") ;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally{
if (br != null){
try {
br.close() ;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
content = sb.toString() ;
return content ;
}
public String getSummary() {
String summary ;
if (content == null ) {
getContent() ;
}
if (content.length() > 200) {
summary = content.substring(0, 200) ;
}else {
summary = content ;
}
return summary;
}
public String getTitle(){
return file.getName() ;
}
public static void main(String[] args){
PdfParser parser = new PdfParser(new File("E:\\EclipseStudyWorkspace\\LuceneParse\\fileSource\\123.pdf")) ;
System.out.println("pdf content : "+parser.getContent()) ;
}
}
项目的结构
-projectName
------------src
------------webroot
------------xpdf
----xpdf-3.02pl2-win32
----xpdf-chinese-simplified
发表评论
文章已被作者锁定,不允许评论。
-
JRebel
2016-06-02 10:14 5152. 单独使用JRebel.jar破解版(如在64位下WIND ... -
android
2016-05-10 14:53 268对于 LinearLayout 当 android:ori ... -
db2学习
2014-06-23 11:47 956LOAD CLIENT FROM 'E:\1\data\EC ... -
html 里的回车换行,制表符 引号,\ 等符号的转化
2014-06-22 17:17 1048/* * To change this license h ... -
netbeans插件
2013-12-10 22:42 371Text Popup Menu 在 Windows 7 ... -
ipv6
2013-11-24 20:23 460Window--Preferences--JAVA--Inst ... -
java编码转化
2013-11-09 16:46 546@Test public void test ... -
eclipse打开插件
2013-11-06 16:24 526openExplorer 一个eclipse小插件——打开当前 ... -
在Windows上与MySQL服务器的连接失败(大数据量) For Windows-based platforms, see Microsoft Knowle
2013-07-16 09:11 1210发生的异常: The driver was unable ... -
java 公用库
2013-03-13 10:24 776转载 本文主要介绍自己在开发过程中总结的一些Java工具类,主 ... -
InetAddress
2013-03-04 13:59 551String ip = ""; ... -
转文件编码格式
2012-06-19 19:55 866转@Test public void testConver ... -
myeclipse 9 property svn 插件
2011-10-07 18:04 12401、下载最新的SVN包(我下的是1.0.6版): http: ... -
lucene compass
2011-09-25 17:39 869PaodingAnalyzer analy ... -
spring 中的bean 大小写
2011-07-19 16:45 2352使用spring注释形式注入bean,通过@Component ... -
struts2
2010-09-17 11:48 592<s:if test="%{#session. ... -
编 码
2009-08-08 00:06 758Java中的ASCII,Unicode和UTF ... -
jBPM4.0中文文档.pdf
2009-07-23 10:16 1816jBPM4.0中文文档 -
Java中获取指定URL的输出
2009-07-21 19:38 1428import java.io.ByteArrayOutputS ... -
location.href 和session的问题
2009-05-21 18:52 1861一个页面有两个 iframe 在其中一个iframe页面内访问 ...
相关推荐
包括了xpdf-3.02pl4-win32.zip和xpdf-chinese-simplified.tar.gz用于Lucene对PDF的中文查询
pdf,xpdf,linux,xpdf-3.02pl2-linux,lucene xpdf-3.02pl2-linux.tar.gz(对pdf格式文件操作的工具包)
xpdf 读取pdf
Xpdf是一款开源的PDF浏览器。 对于移植到不同平台。
xpdf实例,这是介绍的博客地址http://blog.csdn.net/jiuyueguang/article/details/10083965
xpdf-tools-win-4.03
# centos7-mini-xpdf-install.sh文件放到用户目录~下 $ cd ~ $ chmod -Rf 755 centos7-mini-xpdf-install.sh $ ./centos7-mini-xpdf-install.sh # 执行命令即可pdf转png图片 $ pdftopng pdf文件全路径 png图片输出...
最新版的xpdf源码
Linux下,pdf往往不能阅读,都是乱码。利用xpdf在linux下进行编译之后,可以阅读pdf文件。支持多种操作系统。
使用 xpdf 提取中文PDF文件内容的使用步骤使用 xpdf 提取中文PDF文件内容的使用步骤
xpdf-4.00的源代码,最新,可下载,可实现源码安装。。。
我练习时写的基于xPDF和Qt的PDF阅读器,可读取PDF中的层信息根据选定的DPI生成BMP文件。同时重载了QT的按钮类和widget,做了界面优化。
xpdf,pdf阅读器
xpdf处理pdf文档转txt格式源码,项目同时有pdfbox实现pdf转txt的转换实现,二者比较,xpdf的效果更佳。
xpdf使用C++编写,主要操作pdf文件,压缩包不包括例子,编译使用cmake构建后使用VS进行编译,很简单易操作的库,有代码就可以看得懂
在xpdf的基础上进行文件整合,最终形成现有的xpdf工具,需使用命令行进行功能调用,主要用于结合java等编程语言,在项目中实现对pdf转各类文件的作用
PDF文字提取工具 xpdf_pdftotext 已经配置好配置文件 pdftotext ? Portable Document Format (PDF) to text converter (version 4.03) http://www.xpdfreader.com/pdftotext-man.html 通过XPDF读取PDF内容并转化为...
xpdf-tools-win-4.03 and XpdfReader-win64-4.03.
xpdf-chinese-simplified.tar