`
longgangbai
  • 浏览: 7257635 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

POI3.8组件研究(八)--基于SXSSF (Streaming Usermodel API)的写文件

 
阅读更多

      在POI3.8中SXSSF仅仅支持excel2007格式是对XSSF的一种流的扩展。目的在生成excel时候,需要生成大量的数据的时候,通过刷新的方式将excel内存信息刷新到硬盘的方式,提供写入数据的效率。

官方原文如下:

SXSSF (Streaming Usermodel API)

Note
          SXSSF is a brand new contribution and some features were added after it was first introduced in POI 3.8-beta3. Users are advised to try the latest build from trunk. Instructions how to build are here .

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limite d. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

                 You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed a nd cannot be accessed via getRow() anymore.

                   The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.

The example below writes a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.

 

测试代码如下:

package com.easyway.excel.events.stream;

import java.io.FileOutputStream;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
/**
 * SXSSF (Streaming Usermodel API)
 *     当文件写入的流特别的大时候,会将内存中数据刷新flush到硬盘中,减少内存的使用量。
 * 起到以空间换时间作用,提供效率。
 * 
 * @Title: 
 * @Description: 实现TODO
 * @Copyright:Copyright (c) 2011
 * @Company:易程科技股份有限公司
 * @Date:2012-6-17
 * @author  longgangbai
 * @version 1.0
 */
public class SXSSExcelEvent {
	public static void main(String[] args) throws Throwable {
		//创建基于stream的工作薄对象的
        Workbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
		//SXSSFWorkbook wb = new SXSSFWorkbook(); 
		//wb.setCompressTempFiles(true); // temp files will be gzipped
        Sheet sh = wb.createSheet();
        //使用createRow将信息写在内存中。
        for(int rownum = 0; rownum < 1000; rownum++){
            Row row = sh.createRow(rownum);
            for(int cellnum = 0; cellnum < 10; cellnum++){
                Cell cell = row.createCell(cellnum);
                String address = new CellReference(cell).formatAsString();
                cell.setCellValue(address);
            }

        }

        // Rows with rownum < 900 are flushed and not accessible
        //当使用getRow方法访问的时候,将内存中的信息刷新到硬盘中去。
        for(int rownum = 0; rownum < 900; rownum++){
          System.out.println(sh.getRow(rownum));
        }

        // ther last 100 rows are still in memory
        for(int rownum = 900; rownum < 1000; rownum++){
        	System.out.println(sh.getRow(rownum));
        }
        //写入文件中
        FileOutputStream out = new FileOutputStream("C://sxssf.xlsx");
        wb.write(out);
        //关闭文件流对象
        out.close();
        System.out.println("基于流写入执行完毕!");
    }
}
 

         SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files can grow to a very large value . For example, for a 20 MB csv data the size of the temp xml becomes more than a gigabyte. If the size of the temp files is an issue, you can tell SXSSF to use gzip compression:

  SXSSFWorkbook wb = new SXSSFWorkbook(); 
  wb.setCompressTempFiles(true); // temp files will be gzipped


分享到:
评论
4 楼 ae6623 2013-09-17  
ae6623 写道
楼主的 这句

//当使用getRow方法访问的时候,将内存中的信息刷新到硬盘中去。

什么意思,不使用这个方法getRow()就不可以释放内存么?必须调用一下去释放吗?


奥 明白了,楼主是为了输出,看一下 是否为null,从而确定已经从内存中释放掉了。原来的是这么写的:

// Rows with rownum < 900 are flushed and not accessible
        for (int rownum = 0; rownum < 900; rownum++) {
            Assert.assertNull(sh.getRow(rownum));
            Assert.assertNull(sh1.getRow(rownum));
        }

        // ther last 100 rows are still in memory
        for (int rownum = 900; rownum < 1000; rownum++) {
            Assert.assertNotNull(sh.getRow(rownum));
            Assert.assertNotNull(sh1.getRow(rownum));
        }
3 楼 ae6623 2013-09-17  
楼主的 这句

//当使用getRow方法访问的时候,将内存中的信息刷新到硬盘中去。

什么意思,不使用这个方法getRow()就不可以释放内存么?必须调用一下去释放吗?
2 楼 longgangbai 2012-07-04  
1R1 写道
SXSSF 这个速度上不去哦,100W行得7-8分钟的样子,不知道楼主有没有根据rowAccessWindowSize这个调优的可以分享下?还是得改造SXSSF这个代码

不好意思,我没有尝试,起始100W行这样的大数据量的话,最好循环刷新到硬盘,减少内存使用。我是这样的理解。希望继续沟通!
1 楼 1R1 2012-06-26  
SXSSF 这个速度上不去哦,100W行得7-8分钟的样子,不知道楼主有没有根据rowAccessWindowSize这个调优的可以分享下?还是得改造SXSSF这个代码

相关推荐

Global site tag (gtag.js) - Google Analytics