locked
export existing aspx page to pdf with itextsharp-turkish character problem. RRS feed

  • Question

  • User-580328530 posted

    hi all,

    I am using itextsharp.dll to export webpage to pdf. the code is like this:

    Protected Sub TekTik1_Click(ByVal sender As Object, ByVal e As System.EventArgs)
    dvText.Visible = True
    Dim attachment As String = "attachment; filename=Article.pdf"
    Response.ClearContent()        
    Response.AddHeader("content-disposition", attachment)
    Response.ContentType = "application/pdf"
    Response.ContentEncoding = System.Text.Encoding.UTF8        
    Dim stw As New StringWriter()        
    Dim htextw As New HtmlTextWriter(stw)   
    dvText.RenderControl(htextw)       
    Dim document As New Document()        
    PdfWriter.GetInstance(document, Response.OutputStream)
    document.Open()
    Dim str As New StringReader(stw.ToString())
    Dim htmlworker As New HTMLWorker(document)
    htmlworker.Parse(str)
    document.Close()
    Response.Write(document)
    Response.[End]()     
    dvText.Visible = False
    End Sub Public Overrides Sub VerifyRenderingInServerForm(ByVal control As Control) End Sub


    I have got turkish characters problem I found a way to solve this problem by googling. there is a function  created like this:

        
    Private Function ReplaceRegex(ByVal a As String) As String
    a = Regex.Replace(a, "ı", "\u0131") a = Regex.Replace(a, "ç", "\u00e7") a = Regex.Replace(a, "ü", "\u00f6") a = Regex.Replace(a, "ö", "\u00fc") a = Regex.Replace(a, "ğ", "\u011f") a = Regex.Replace(a, "Ü", "\u00dc") a = Regex.Replace(a, "Ö", "\u00d6") a = Regex.Replace(a, "Ğ", "\u011e") a = Regex.Replace(a, "Ç", "\u00c7") a = Regex.Replace(a, "İ", "\u0130") Return a End Function


    But I do not know how to implement this font function into htmlworker.parse()

    I would appreciate if youcan help me.

    thanks.

    OzerS

    Sunday, October 24, 2010 10:00 AM

Answers

  • User1520641890 posted

    hi,

    sorry to say this, but some of the examples you find with Google on the Internet aren't so good. in particular the following aren't needed:

    1. Dim htmlworker As New HTMLWorker(document)  
    2. htmlworker.Parse(str)
    3. Response.Write(document)

    here is a simple example:

    <%@ Page Language="VB" %>
    <%@ Import Namespace='System.Collections.Generic' %>
    <%@ Import Namespace='System.IO' %>
    <%@ Import Namespace='iTextSharp.text' %>
    <%@ Import Namespace='iTextSharp.text.pdf' %>
    <%@ Import Namespace='iTextSharp.text.html.simpleparser' %>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    <script runat="server">
      Protected Sub getPdf(ByVal sender As Object, ByVal e As CommandEventArgs)
        Context.Response.ContentType = "application/pdf"
        Using stw = New StringWriter()
          Using htextw = New HtmlTextWriter(stw)
            sampleText.RenderControl(htextw)
            Using str = New StringReader(stw.ToString())
              Dim doc As Document = New Document
              PdfWriter.GetInstance(doc, Context.Response.OutputStream)
              doc.Open()
              Dim elements As List(Of IElement) = HTMLWorker.ParseToList( _
                str, Nothing _
              )
              For Each element In elements
                doc.Add(element)
              Next
              doc.Close()
              Response.End()
            End Using
          End Using
        End Using
      End Sub
    </script>
    
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head runat="server"><title></title></head>
    <body><form id="form1" runat="server">
    <asp:Label id='sampleText' runat='server' Text='ıçüöğÜÖĞÇİ' />
    <asp:Button runat='server'
      oncommand='getPdf'
      text='Submit'
    />
    </form></body></html>

    if you notice, replaced the bulleted items above with a single call to HTMLWorker.ParseToList(). please try the example and let me know if it works - it did for me as-is.

    but you may need to explicitly embed fonts into the PDF if it doesn't...

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, October 25, 2010 12:21 AM

All replies

  • User1520641890 posted

    hi,

    sorry to say this, but some of the examples you find with Google on the Internet aren't so good. in particular the following aren't needed:

    1. Dim htmlworker As New HTMLWorker(document)  
    2. htmlworker.Parse(str)
    3. Response.Write(document)

    here is a simple example:

    <%@ Page Language="VB" %>
    <%@ Import Namespace='System.Collections.Generic' %>
    <%@ Import Namespace='System.IO' %>
    <%@ Import Namespace='iTextSharp.text' %>
    <%@ Import Namespace='iTextSharp.text.pdf' %>
    <%@ Import Namespace='iTextSharp.text.html.simpleparser' %>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    <script runat="server">
      Protected Sub getPdf(ByVal sender As Object, ByVal e As CommandEventArgs)
        Context.Response.ContentType = "application/pdf"
        Using stw = New StringWriter()
          Using htextw = New HtmlTextWriter(stw)
            sampleText.RenderControl(htextw)
            Using str = New StringReader(stw.ToString())
              Dim doc As Document = New Document
              PdfWriter.GetInstance(doc, Context.Response.OutputStream)
              doc.Open()
              Dim elements As List(Of IElement) = HTMLWorker.ParseToList( _
                str, Nothing _
              )
              For Each element In elements
                doc.Add(element)
              Next
              doc.Close()
              Response.End()
            End Using
          End Using
        End Using
      End Sub
    </script>
    
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head runat="server"><title></title></head>
    <body><form id="form1" runat="server">
    <asp:Label id='sampleText' runat='server' Text='ıçüöğÜÖĞÇİ' />
    <asp:Button runat='server'
      oncommand='getPdf'
      text='Submit'
    />
    </form></body></html>

    if you notice, replaced the bulleted items above with a single call to HTMLWorker.ParseToList(). please try the example and let me know if it works - it did for me as-is.

    but you may need to explicitly embed fonts into the PDF if it doesn't...

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, October 25, 2010 12:21 AM
  • User-580328530 posted

    thanks for answer. it works but I think I forgot to tell you that I have div which contains gridview and datalist controls databound inside with table organized. can I convert it with  your method? I tried replace label with div it did not work and gives error :The document has no pages.

    Monday, October 25, 2010 3:07 AM
  • User1520641890 posted

    I think I forgot to tell you that I have div which contains gridview and datalist controls databound inside with table organized

    it seems that question is asked quite frequently here. to be totally honest, i'm not sure because i stay away from using the gridview/datalist controls. IMHO they are overkill for most situations - i do mainly Intranet applications, so i can get away with writing fat clients using ajax/json.

    what i suggest you try is after calling Render() on your server side control (the DIV does have runat='server', correct?) you use one of the .NET XML parsers to get a stripped-down version of the HTML content you need. the reason is that HTMLWorker can only handle very simple HTML markup. i'm guessing your gridview/datalist HTML may be causing the problem.

    at least now you know the turkish characters can be handled :)


    Monday, October 25, 2010 7:53 AM