113

.NET使用Office Open XML导出大量数据到 Excel - RHINO_WU

 6 years ago
source link: https://www.cnblogs.com/rhino/p/8283219.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

.NET使用Office Open XML导出大量数据到 Excel

我相信很多人在做项目的都碰到过Excel数据导出的需求,我从最开始使用最原始的HTML拼接(将需要导出的数据拼接成TABLE标签)到后来happy的使用开源的NPOI, EPPlus等开源组件导出EXCEL,但不久前,我在一个项目碰到一个需求:要将几个分别有近60多万的数据源导出到Excel中,我们先不要讨论这个需求本身是否合理,客户就是要这样。我先后用NPOI和EPPlus,都发现同一个问题:OutOfMemoryException,我电脑12G内存居然不够用?

的确内存溢出了,但内存还剩下好几个G的,就会溢出,我用 .NET做的网站,开发的时候Host应该是Visual Studio安装的IIS Express, 应该是VS本身的限制,不过在网上查阅资料也没发现这的确也是困扰一些人的,也没查到什么结果,好在还有Google, 跃过墙外,在Stack Overflow上查到资料: OpenXML , 这不是什么新技:  Office 2007在设计的时候, 为了更好的和其它应用程序交互,使用了XML + ZIP技术来实现excel, world, PPT等组件的本地保存, 我们所使用xlsx, dox, pptx文件本质上就一个ZIP压缩包,包内是组织好的XML文件,也就是说,我们可以通过生成, 修改, 生成合规的XML文件,再压缩成ZIP包,这就是一个可以被Office识别的文件了。

用图说话:

204983-20180114125700644-168293098.png
204983-20180114125706676-1746623323.png
204983-20180114125724832-1865745498.png

在园子里其实也有不少人介绍过 Open XML, 我想就多一个视角来介绍Open XML吧,好像也有很长时间没人写关于这个博文。

什么是Office Open XML?

我们来看下维基百科的定义:

Office Open XML (also informally known as OOXML or Microsoft Open XML (MOX)[2) is a zipped, XML-based file format developed by Microsoft[3] for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by Ecma (as ECMA-376), and by the ISO and IEC (as ISO/IEC 29500) in later versions.

Starting with Microsoft Office 2007, the Office Open XML file formats have become the default[4] target file format of Microsoft Office.[5][6] Microsoft Office 2010 provides read support for ECMA-376, read/write support for ISO/IEC 29500 Transitional, and read support for ISO/IEC 29500 Strict.[7] Microsoft Office 2013 and Microsoft Office 2016 additionally support both reading and writing of ISO/IEC 29500 Strict.[8]re

refer: https://en.wikipedia.org/wiki/Office_Open_XML

从Office 2007开始,就开始使用XML文件格式作为Microsoft Office的默认保存方式,其实我们通常用的NPOI  office 2007部分和EPPlus就是使用Open XML来开发的。

为什么同是使用Open XML, NPOI和EPPLus会出现内存溢出的问题?

 这两个开源组件有对Office套件有着很全面的支持,它们会把数据加载到内存中一次性处理,如果碰到数据量过大,就很可能 遇到这个问题,网上EPPlus在20多万条数据的就溢出了,NPOI在11多万的时候就会溢出, 这个是和数据的列数和内容有关系,不管怎样,我们以后可能是会碰到这种大量数据的EXCEL导出,我们不需要很复杂的功能,就是想要导出一个EXCEL列表,这其实是可以做到的。

Open XML怎样做不会内存溢出?

NPOI和EPPlus在导出大量数据 的Excel列表时可能 会发生内存溢出的问题,原因是它们都把数据保存在内存中,因为它们支持各种复杂的功能,那么简单的列表,就是数量超大,我们把它通过文件流写入磁盘,这个问题就解决了。

如何使用OPEN XML?

我们需要去微软官网下载OFFICE OPEN XML的SDK,链接: https://www.microsoft.com/en-hk/download/details.aspx?id=30425,推荐使用NuGet在VISULAL STUDIO直接将引用添加到Project。

在GitHub还有一些示例代码:https://github.com/OfficeDev/Open-XML-SDK

代码实现

说了这么多废话,我们看如何用OPEN XML实现一个EXCEL列表的导出:

从原理上讲就是用OpenXML一个一个把标签写入本地磁盘。

我截取我写的导出类的几个方法来来解释:

/// <summary>
/// 指定磁盘路径初始化OpenWorkDoucment
/// </summary>
/// <param name="fileName"></param>
private void OpenWorkDocument(string fileName)
{
     document = SpreadsheetDocument.Create(fileName, SpreadsheetDocumentType.Workbook);
}
///<summary>
///用datatable作为数据源,实际情况可以根据需要调整
///</summary>
public void AddSheet(DataTable dt, string sheetName)
        {
            if (dt == null || dt.Rows.Count == 0)
            {
                throw new ArgumentNullException(nameof(dt), "data source can not be null");
            }

            if (document == null)
            {
                throw new ArgumentNullException(nameof(document), "please init document first");
            }


            //this list of attributes will be used when writing a start element
            List<OpenXmlAttribute> attributes;
       //这是我们为什么不会溢出的关键点, 使用XmlWriter写入磁盘 OpenXmlWriter writer; WorksheetPart workSheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>(); writer = OpenXmlWriter.Create(workSheetPart);
    //使用OpenXML麻烦的地方就是我们要用SDK去拼接XML内容 writer.WriteStartElement(new Worksheet()); writer.WriteStartElement(new SheetViews()); //sheetViews writer.WriteStartElement(new SheetView() //sheetView { TabSelected = true, WorkbookViewId = 0U //这里的下标是从0开始的 });         //这里是冻结列头,别问为什么是A2,我试了A1不行 Pane pane = new Pane() { State = new EnumValue<PaneStateValues>(PaneStateValues.Frozen), VerticalSplit = new DoubleValue((double)1), TopLeftCell = new StringValue("A2"), ActivePane = new EnumValue<PaneValues>(PaneValues.BottomLeft) };        //对于一些文档本身的结构的描述,我们可以直接把准备属性设置正确,直接写入,因为描述实例很占用资源小,当然我们也可以把描述结点的子节点,子子节点都通过WriteStartElememt写入,不过很麻烦,容易出错 writer.WriteStartElement(pane); //Pane writer.WriteEndElement(); //Pane writer.WriteStartElement(new Selection() { Pane = new EnumValue<PaneValues>(PaneValues.BottomLeft) }); writer.WriteEndElement(); //Selection 关闭标签 writer.WriteEndElement(); //sheetView 关闭标签 writer.WriteEndElement(); //sheetViews 关闭标签 writer.WriteStartElement(new SheetData()); var rowIndex = 0; foreach (DataRow row in dt.Rows) { //build header if (rowIndex == 0) { //create a new list of attributes attributes = new List<OpenXmlAttribute>(); // add the row index attribute to the list attributes.Add(new OpenXmlAttribute("r", null, (rowIndex + 1).ToString())); //header start writer.WriteStartElement(new Row(), attributes); foreach (DataColumn col in dt.Columns) { attributes = new List<OpenXmlAttribute>(); //这里注意,在Excel在处理字符串的时候,会将所有的字符串保存到sharedStrings.xml, cell内写入在sharedString.XML的索引, 属性t(type)设置为s(str)//我们在导出excel的时候把sharedString.mxl考虑进来会加大复杂程度,所以将t设置为str, 一个不存在的type, excel会直接解析cell内的字串值 attributes.Add(new OpenXmlAttribute("t", null, "str")); //通过s指定style样式的下标 attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_HEADER.ToString()));
//能过r指定单元格位置,好像不是必需, 注意这里下标位置是从1开始的 attributes.Add(new OpenXmlAttribute("r", "", string.Format("{0}{1}", GetColumnName(col.Ordinal + 1), rowIndex + 1))); writer.WriteStartElement(new Cell(), attributes); writer.WriteElement(new CellValue(col.ColumnName)); writer.WriteEndElement(); } //header end writer.WriteEndElement(); rowIndex++; } //数据写入,我们通过xmlWriter不会触发异常//create a new list of attributes attributes = new List<OpenXmlAttribute>(); // add the row index attribute to the list attributes.Add(new OpenXmlAttribute("r", null, (rowIndex + 1).ToString())); //header start writer.WriteStartElement(new Row(), attributes); foreach (DataColumn col in dt.Columns) { attributes = new List<OpenXmlAttribute>(); switch (col.DataType.ToString()) { case "System.Int32": attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_INT.ToString())); attributes.Add(new OpenXmlAttribute("t", null, "n")); //number break; case "System.Double": case "System.Decimal": case "System.Float": attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_DEC.ToString())); //header style attributes.Add(new OpenXmlAttribute("t", null, "n")); //number break; default: attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_STR.ToString())); //header style attributes.Add(new OpenXmlAttribute("t", null, "str")); //string break; } //add the cell reference attribute attributes.Add(new OpenXmlAttribute("r", null, string.Format("{0}{1}", GetColumnName(col.Ordinal + 1), rowIndex + 1))); writer.WriteStartElement(new Cell(), attributes); writer.WriteElement(new CellValue(row[col.Ordinal].ToString())); writer.WriteEndElement(); } //header end writer.WriteEndElement(); rowIndex++; } // End SheetData writer.WriteEndElement(); // End Worksheet writer.WriteEndElement(); writer.Close(); if (document.WorkbookPart.Workbook == null) { document.WorkbookPart.Workbook = new Workbook(); document.WorkbookPart.Workbook.Append(new Sheets()); } //数据写入完成后,注册一个sheet引用到workbook.xml, 也就是在excel最下面的sheet name var sheet = new Sheet() { Name = !String.IsNullOrWhiteSpace(sheetName) ? sheetName : ("Sheet " + DateTime.Now.ToString("ms")), SheetId = UInt32Value.FromUInt32((uint)m_sheetIndex++), Id = document.WorkbookPart.GetIdOfPart(workSheetPart) }; document.WorkbookPart.Workbook.Sheets.Append(sheet); }
//生成Style样式, 注意下标从0开始, 依次加1, 如果有跳过1直接设置3这样情况, 可能无法正常解析到样式
private Stylesheet GenerateStylesheet()
        {
            Stylesheet styleSheet = null;

            Fonts fonts = new Fonts(
                new Font( // Index 0 - default
                    new FontSize() { Val = 11 }

                ),
                new Font( // Index 1 - header
                    new FontSize() { Val = 11 },
                    new Bold(),
                    new Color() { Rgb = "FFFFFF" }

                ));

            Fills fills = new Fills(
                    new Fill(new PatternFill() { PatternType = PatternValues.None }), // Index 0 - default
                    new Fill(new PatternFill() { PatternType = PatternValues.Gray125 }), // Index 1 - default
                    new Fill(new PatternFill(new ForegroundColor { Rgb = new HexBinaryValue() { Value = "0070c0" } }) { PatternType = PatternValues.Solid })
                );



            Borders borders = new Borders(
                    new Border(), // index 0 default
                    new Border( // index 1 black border
                        new LeftBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin },
                        new RightBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin },
                        new TopBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin },
                        new BottomBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin },
                        new DiagonalBorder())
                );

            NumberingFormats numbers = new NumberingFormats(
                new NumberingFormat() { NumberFormatId = 0, FormatCode = new StringValue("#,##0.00") },
                new NumberingFormat() { NumberFormatId = 1, FormatCode = new StringValue("0") }
                );


            CellFormats cellFormats = new CellFormats(
                    // default
                    new CellFormat() { FormatId = FORMAT_INDEX_DEFUALT },
                    // body  string
                    new CellFormat { FormatId = FORMAT_INDEX_STR, FontId = 0, FillId = 0, BorderId = 1, ApplyBorder = true },
                    // body decimal
                    new CellFormat { FormatId = FORMAT_INDEX_DEC, FontId = 0, FillId = 0, BorderId = 1, NumberFormatId = 0, ApplyBorder = true },
                    //header
                    new CellFormat { FormatId = FORMAT_INDEX_HEADER, FontId = 1, FillId = 2, BorderId = 1, ApplyFill = true }, // header
                    // body  int
                    new CellFormat { FormatId = FORMAT_INDEX_INT, FontId = 0, FillId = 0, BorderId = 1, NumberFormatId = 1, ApplyBorder = true }
                );

            styleSheet = new Stylesheet(numbers, fonts, fills, borders, cellFormats);
            return styleSheet;  
        }  
private void WriteWorkbookStyle()
        {
            if (document != null)
            {
                WorkbookStylesPart stylePart = document.WorkbookPart.AddNewPart<WorkbookStylesPart>();
                var styleSheet = GenerateStylesheet();
                styleSheet.Save(stylePart);
            }

        }

 设置样式,冻结首行,这些都可以简单完成,如果需要添加图表什么的,还是建议用NPOI, EPPlus等开源方案,有图表的excel不会太大。

对于Open XML的介绍就到这里了,有什么错误的地方,请指正。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK