none
Creating a class to stop UTF-8 enitity conversion and remove extra space from self-closing nodes? RRS feed

  • Question

  • Hi,

    I'm trying to stop the xml parser from converting the UTF-8 enitity hex codes to its character alternative and also not adding the extra space in self-closing nodes after parsing.

    I've created a class (say DB_class) shown as below

    using System.IO;
    using System.Xml.Linq;
    class tDocument:XDocument
    {
    	string input_string;
    	int option;
    	
    	
    	public tDocument(string input,int opt)
    	{
    		this.input_string=input;
    		this.option=opt;
    	}
    	public static XDocument tParse(string path)
    	{
    		string file_content = escape_string(File.ReadAllText(path), 0);
    		XDocument doc = XDocument.Parse(file_content, LoadOptions.PreserveWhitespace);
    		return doc;
    	}
    	public static void tSave(string path, XDocument doc)
    	{
    		doc.Save(path, SaveOptions.DisableFormatting);
    		File.WriteAllText(path, escape_string(doc.ToString(), 1));
    	}
    	public static string escape_string(string input_string, int option)
    	{
    		switch (option)
    		{
    			case 0:
    				return input_string.Replace("&", "&");
    			case 1:
    				var x = input_string.Replace(" />", "/>");
    				var y = x.Replace("&", "&");
    				return y;
    
    			default:
    				return null;
    
    		}
    	}
    }

    Here is an example to show how I'm using this class to do stuff (basically the program get the value of node ID by name from info.xml and adding it to the respective names attribute id)

    XDocument myfile=tDocument.tParse(@"D:\test\APril 2018\testing.xml");
    var names=myfile.Descendants("name").ToList();
    foreach (var name in names)
    {
    	XDocument infofile=tDocument.tParse(@"D:\test\APril 2018\info.xml");
    	var data=infofile.Descendants("student").Where(x=>x.Element("s-name").Value==name.Value).Select(y=>y.Element("ID").Value).First();
    	name.Add(new XAttribute("id",data));
    }
    tDocument.tSave(@"D:\test\APril 2018\testing.xml",myfile);
    Console.ReadLine();

    testing.xml

    <students>
      <student>
        <name>Jane Doe</name>
        <major>CENT</major>
        <hobby>Music</hobby>
        <age>30</age>
      </student>
      <student>
        <name>John D&#x00F4;e</name>
        <major>PHYS</major>
        <hobby/>
        <age>14</age>
      </student>
      <student>
        <name>Finn B&#x00E1;lor</name>
        <major>WWE</major>
        <hobby/>
        <age>36</age>
      </student>
    </students>

    and info.xml

    <?xml version="1.0"?>
    <students>
      <student>
        <s-name>Jane Doe</s-name>
        <ID>X-7</ID>
      </student>
      <student>
        <s-name>John D&#x00F4;e</s-name>
        <ID>N-4</ID>
      </student>
      <student>
        <s-name>Finn B&#x00E1;lor</s-name>
        <ID>D-22</ID>
      </student>
    </students>
    What I want to know is how can I make my class (DB_class) more efficient, and also if there is a better way doing what the class is supposed to do?


    Friday, April 13, 2018 6:50 AM

All replies

  • Why are you trying to do this?  You are trying to solve your problem in the wrong way.  The entity codes are just an accident of the XML standard, and merely provide a way to represent extended codes in a safe way.  The guy's name is "John Dôe", not "John D&#x00F4;e", and in your code you should always be working with the name as "John Dôe".

    And why on earth do you care whether the tag is closed with " />" or "/>"?  That's an XML file, and XML does not care.

    What is your overall goal?


    Tim Roberts, Driver MVP Providenza & Boekelheide, Inc.

    Friday, April 13, 2018 9:51 PM
  • "Why are you trying to do this?"

    Quite simply becoz my client want it that way, I don't know why they want that but they said I need to do that, like keep the entity codes in the hex format as it is and remove those extra whitespace in the self-closing nodes...

    Saturday, April 14, 2018 12:20 AM
  • Hi Don Bradman,

    >>Quite simply becoz my client want it that way, I don't know why they want that but they said I need to do that, like keep the entity codes in the hex format as it is and remove those extra whitespace in the self-closing nodes...

    We could use 3rd-party library to achieve it. like this:

    Please add nugget Mono.HttpUtility.

    using Mono.Web;
    using System;
    using System.Linq;
    using System.Xml.Linq;
    
    namespace ConsoleApp2
    {
        class Program
        {
            static void Main(string[] args)
            {
                XDocument myfile = tDocument.tParse(@"D:\Code\WPF\BackupVersion\ConsoleApp2\testing.xml");
                var names = myfile.Descendants("name").ToList();
                foreach (var name in names)
                {
                    XDocument infofile = tDocument.tParse(@"D:\Code\WPF\BackupVersion\ConsoleApp2\info.xml");
                    var data = infofile.Descendants("student").Where(x => x.Element("s-name").Value == name.Value).Select(y => y.Element("ID").Value).First();
                    name.Value = HttpUtility.HtmlDecode(name.Value);
                    name.Add(new XAttribute("id", data));
                }
                tDocument.tSave(@"D:\Code\WPF\BackupVersion\ConsoleApp2\testing.xml", myfile);
                Console.ReadLine();
            }
        }
    }
    

    Best regards,

    Zhanglong


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, April 16, 2018 7:53 AM
    Moderator
  • Hi Zhanglong,

    I did not get what you were trying to show in your above post.

    My question was, how (if possible) can I make my tDocument class more efficient?

    Tuesday, April 17, 2018 12:27 AM
  • Hi Don Bradman,

    >>My question was, how (if possible) can I make my tDocument class more efficient?

    I think it is ok, do you encounter performance issue?

    Best regards,

    Zhanglong


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, April 17, 2018 5:45 AM
    Moderator