count the urls in xml file
-
Tuesday, November 22, 2011 1:54 PM
HiI have more than one thousand(1000) xml file like (rud.xml)(rod02.xml) and so on in Temp folder which contains more than fifty thousand(50000) url below is the structure of the xml file,my requirment is that i want to count the url of all Xml folder which is in Temp folder and the count shouldbe shown in the label control as i am working in c# winform suppose i have one xml file which contains 100 url and another xml file contains 200 file and so on then after the loop completes the label should show the count 300<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url><loc>http://www.abc.com/pub/valentina-galetta/43/298/9A5</loc></url> <url><loc>http://www.cool.com/pub/isoboye-george/43/298/9A4</loc></url> <url><loc>http://www.good.com/pub/evcil-sever/43/298/9A3</loc></url> <url><loc>http://www.fond.com/pub/yogesh-talekar/43/298/9A2</loc></url> <url><loc>http://www.solid.com/pub/viriginia-guerrero/43/298/9A1</loc></url> <url><loc>http://www.jeo.com/pub/bekir-3f-3feng-3f-3fn/43/298/9A0</loc></url> <url><loc>http://www.fan.com/pub/marco-rossini/43/298/99B</loc></url> <url><loc>http://www.force.com/pub/celia-jim-3f-3fnez/43/298/99A</loc></url> <url><loc>http://www.super.com/pub/thumma-ramakrishna/43/298/999</loc></url> <url><loc>http://www.duper.com/pub/natalia-esp-3f-3fndola/43/298/997</loc></url> <url><loc>http://www.look.com/pub/nikhil-parashar/43/298/996</loc></url> <url><loc>http://www.my.com/pub/andr-3f-3fs-plaza-20-editorial-20alianza-/43/298/995</loc></url> <url><loc>http://www.shop.com/pub/marcel-erdmann/43/298/993</loc></url> <url><loc>http://www.poo.com/pub/enrique-cortes/43/298/992</loc></url> <url><loc>http://www.in.com/pub/prajyot-dhumal/43/298/990</loc></url> <url><loc>http://www.out.com/pub/milton-rodrigues/43/298/98B</loc></url> <url><loc>http://www.week.com/pub/rolf-schmitz/43/298/98A</loc></url> <url><loc>http://www.strong.com/pub/anil-anil/43/298/988</loc></url> <url><loc>http://www.qual.com/pub/joe-soap/43/298/986</loc></url> <url><loc>http://www.time.com/pub/silviu-pintea/43/298/985</loc></url> <url><loc>http://www.super.com/pub/paulo-primo/43/298/983</loc></url> </urlset>
- Edited by mohammad salmaan Tuesday, November 22, 2011 1:59 PM
All Replies
-
Tuesday, November 22, 2011 2:33 PM
Assuming XML structure won't change
int GetUrlCount(string folderPath) { int count = 0; string[] files = System.IO.Directory.GetFiles(folderPath); foreach (string file in files) { count += ParseAndGetCount(file); } return count; } int ParseAndGetCount(string file) { XmlDocument doc = new XmlDocument(); doc.LoadXml(System.IO.File.ReadAllText(file)); if (doc.ChildNodes.Count > 0) if (doc.ChildNodes[1].HasChildNodes) return doc.ChildNodes[1].ChildNodes.Count; return 0; }
Use//Change folder path as per your requirement lblCount.Text = GetUrlCount("c:\\temp").ToString();
Thanks,
A.m.a.L Hashim

Dot Net Goodies
- Marked As Answer by mohammad salmaan Thursday, November 24, 2011 12:48 PM
-
Tuesday, November 22, 2011 5:07 PM
@A.m.a.l, you didn't handle the xml namespace.
@mohammad, try this:
using System; using System.IO; using System.Linq; using System.Windows.Forms; using System.Xml.Linq; namespace XmlUrlCountWinApp { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void Form1_Load(object sender, EventArgs e) { try { label1.Text = GetUrlCountFromTempDir().ToString(); } catch (Exception ex) { label1.Text = ex.Message; } } private int GetUrlCountFromTempDir() { string tempPath = Environment.GetEnvironmentVariable("TEMP"); if (tempPath != null) { string[] files = Directory.GetFiles(tempPath, "*.xml"); return files.Sum(file => GetUrlCountFromXml(file)); } throw new Exception("The environment variable %TEMP% is not defined."); } private int GetUrlCountFromXml(string xmlPath) { XNamespace xn = "http://www.sitemaps.org/schemas/sitemap/0.9"; return XElement.Load(xmlPath).Descendants(xn + "loc").Count(); } } }
aelassas.free.fr -
Wednesday, November 23, 2011 7:01 AM
try this boss
DirectoryInfo di = new DirectoryInfo(@"c:\temp\check");
string filepath;
int count =0;
foreach (FileInfo fi in di.GetFiles("*.xml"))
{
filepath = fi.FullName;
XDocument xdoc = XDocument.Load(filepath);
count = count + xdoc.Root.Elements().Count();
}- Marked As Answer by mohammad salmaan Thursday, November 24, 2011 12:48 PM
-
Wednesday, November 23, 2011 1:08 PM
Thanks all of for replying me actually some files are also in the below format also
If we want to count url for this below xml file then where should be the changes including the upper code
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url><loc>http://www.linkedin.com/directory/companies/internet-web2.0-startups-social-networking/barcelona.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/internet-web2.0-startups-social-networking/basel.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/internet-web2.0-startups-social-networking/bath.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/sheffield.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/singapore.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/slough.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/slovak-republic.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/slovenia.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/south-africa.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/spain.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/spokane.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/sri-lanka.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/st-louis.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/stevenage.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/stockholm.html</loc><changefreq>weekly</changefreq></url> <url><loc>http://www.linkedin.com/directory/companies/computer-networking/sweden.html</loc><changefreq>weekly</changefreq></url> </urlset>
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url><loc>http://www.linkedin.com/groups/gid-2431604</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2430868</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/Wireless-Carrier-Reps-Past-Present-2430807</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2430694</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2430575</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2431452</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432377</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2428508</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432379</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432380</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432381</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432383</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432384</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432385</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432388</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/groups/gid-2432391</loc><changefreq>monthly</changefreq></url> </urlset>
salman- Edited by mohammad salmaan Wednesday, November 23, 2011 1:09 PM
-
Wednesday, November 23, 2011 1:16 PMHave you tried executing code I have posted. I tested and it works for both formats.
Thanks,
A.m.a.L Hashim

Dot Net Goodies
-
Thursday, November 24, 2011 6:13 AMya, mine as well. we are not moving through each child node, so it will work fine for the above mentioned structure as well
- Edited by Prahalnathan Thursday, November 24, 2011 6:14 AM
-
Thursday, November 24, 2011 6:52 AM
thanks for replying actually when i am using Hashim code it is giving me error
' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1.
and when i am running Prahalnathan code it is not showing the count of above structure xml file , at least 22 xml files is in this format.. waiting for the reply
salman
- Edited by mohammad salmaan Thursday, November 24, 2011 7:27 AM
-
Thursday, November 24, 2011 8:09 AM
BOSS
in ur xml the first character Line 1, position 1. is wrong. the xml file should start with <....
check that and reply
-
Thursday, November 24, 2011 8:30 AM
Hi Prahalnathan
the file which is showing the error is the first file and the structure is shown below ,when i am running the Hashim code then this error comes, waiting for the reply
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url><loc>http://www.linkedin.com/company/default%20value</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/time-warner-inc.</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/teledyne</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/telefonica-europe-plc</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/rational-software</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/informix-software</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/unicible</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/ibm-global-services</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/societe_conseil_groupe_lgs</loc><changefreq>monthly</changefreq></url> <url><loc>http://www.linkedin.com/company/pureatria</loc><changefreq>monthly</changefreq></url> </urlset>
salman -
Thursday, November 24, 2011 9:22 AMcheck that there is no character(not even space) before the first < coz for me both of our code is working fine for the above said xml
- Edited by Prahalnathan Thursday, November 24, 2011 9:25 AM
-
Thursday, November 24, 2011 9:39 AM
Hi,
did you try my code ? I have tested it with your files and it works well.
Kind regards,
aelassas.free.fr -
Thursday, November 24, 2011 9:44 AM
Hi Prahalnathan
can you send me your email id so that i can send you the file and then you can check to your system
salman- Edited by mohammad salmaan Thursday, November 24, 2011 9:46 AM
-
Thursday, November 24, 2011 9:45 AMboss, all the codes r working fine. the problem is in xml file i think
-
Thursday, November 24, 2011 9:56 AM
ok my brother (mere bhai)
but the same file goes well with your code
salman -
Thursday, November 24, 2011 10:03 AMya my id is masilaster@gmail.com.. send me the file with file name... if possible the folder itself
-
Thursday, November 24, 2011 10:37 AMi have send you the file run this file with your code
salman -
Thursday, November 24, 2011 11:18 AM
hey boss,
problem is with the line
<url><loc>http://www.linkedin.com/directory/companies/computer-networking/s‹o-paulo.html</loc><changefreq>weekly</changefreq></url>
‹ - character is the problem.
In my fix, i am using xmldocument object so it is showing exception, as ‹ is not a valid character in xml.
So use hashim code as he is loading the xml content as string, it will work fine for that.
Post about how it goes
-
Thursday, November 24, 2011 11:24 AM
ok Prahalnathan
i have also run Hashim code for that xml file but it is giving me count 0.
salman -
Thursday, November 24, 2011 11:33 AM
i got the count as 49999 by copy pasting the below code
XmlDocument doc = new XmlDocument(); doc.LoadXml(System.IO.File.ReadAllText("C:\\d_0.xml")); if (doc.ChildNodes.Count > 0) if (doc.ChildNodes[1].HasChildNodes) return doc.ChildNodes[1].ChildNodes.Count;
r u getting any error???
-
Thursday, November 24, 2011 11:50 AMModerator
Why open the files in xml mode? Too hard...use regular expressions for this pattern matching.
string pattern = @"(<url>\s*<loc>)"; var count = Directory.EnumerateFiles(@"D:\temp", "*.xml" ) .Sum(fl => Regex.Matches( File.ReadAllText(fl), pattern) .OfType<Match>() .Count() ); Console.WriteLine("Total Urls " + count);
Tested on both of the xml types you showed. HTHCheck out our MSDN .Net Regular expression Forums for specific regular expression questions. Here are some helful links:- .Net Regular Expression Forum
- Top level post in the forum (.Net Regex Resources Reference
)
- One can use this free tool (Expresso) to test and learn about out regex patterns outside of ones .Net code.
- How to Ask A Regular Expression Question (info topic)
William Wegerson (www.OmegaCoder.Com)
- Edited by OmegaManMVP, Moderator Thursday, November 24, 2011 11:55 AM
-
Thursday, November 24, 2011 12:47 PM
Thanks Prahalnathan for replying
actually return type will be count in int GetcountUrl() in the Hashim brother code now i am getting all the url count . Thanks Hashim
salman -
Thursday, November 24, 2011 1:14 PMcool brother. Any other doubt do post
-
Thursday, November 24, 2011 1:25 PMHave you tried with Linq To XML (the code I provided in my first reply) ?
aelassas.free.fr -
Friday, November 25, 2011 2:00 AMModerator
Hi mohammad salmaan,
Welcome to MSDN forum, very glad to hear that you have fixed your issue. If you have any problem, please feel free to post in the forum, there's lots of specialist who have a fantastic level of technic like Prahalnathan, Link.fr, OmegaMan, A.m.a.L in the forum, they can help you solve the issue effectively.
Best Regards.
Allen Li [MSFT]
MSDN Community Support | Feedback to us
-
Friday, November 25, 2011 6:10 PMModerator
Have you tried with Linq To XML (the code I provided in my first reply) ?
aelassas.free.fr
I took notice!I created a blog article on the three different suggestions and timed them, the marked answer the xmldocument process was the slowest!!!....guess who was the fastest.
Find out here:
.Net Regex: Can Regular Expression Parsing be Faster than XmlDocument or Linq to Xml?
William Wegerson (www.OmegaCoder.Com)- Edited by OmegaManMVP, Moderator Friday, November 25, 2011 6:11 PM
-
Friday, November 25, 2011 7:12 PM
the marked answer the xmldocument process was the slowest!!!....guess who was the fastest.
Find out here:
.Net Regex: Can Regular Expression Parsing be Faster than XmlDocument or Linq to Xml?
William Wegerson (www.OmegaCoder.Com)
In conclusion, the final ranking on this thread is:
- OmegaMan
- Link.fr
- Hashim
aelassas.free.fr- Edited by Link.fr Friday, November 25, 2011 7:13 PM
-
Saturday, November 26, 2011 4:43 AM
- Edited by Malobukv Saturday, November 26, 2011 4:46 AM
-
Saturday, November 26, 2011 9:34 PMModerator
> the marked answer the xmldocument process was the slowest
very rarely. take a look here
He used different regex options which were uncessary which significantly slowed his regex tests down. See his link where I gave my reply.
William Wegerson (www.OmegaCoder.Com)- Edited by OmegaManMVP, Moderator Tuesday, November 29, 2011 2:49 PM

