none
c# how can i parse json form html page RRS feed

  • Question

  • ok i tryied useing regex and wasted like 4 hours of my time trying to get it to work ( it dont haha ) but the issue i having is this string is all one line 

    json in html example

    and what im trying to do is parse out the name > ( value here ) 
    and then parse out the witch and hight <values here ) 

    and then finnaly if the hight and width match 2048 or 4095 then it will find the link and download the image 

    but the issue is i never touched on json i made a regex for the patten of the whole block of code but of corse regex and c# really dont get along well as i found the hard way 

                WebClient elftestweb = new WebClient();
                String searchquery = "https://sketchfab.com/3d-models/kp-31-submachine-gun-387c30629c164b59ba862ca2c2d4c951";
                String scrapdata;
                scrapdata = elftestweb.DownloadString(searchquery);
                //(&#[0-9]+;[a-z]+&#[0-9]+;:\s[0-9]+\},\s{)
                MatchCollection fullregex = Regex.Matches(scrapdata, @"(&#[0-9]+;[a-zA-Z]+&#[0-9]+;:\s&#[0-9]+;[0-9]+-[0-9]+-[0-9]+[A-Z0-9a-z]+:[0-9]+:[0-9]+.[0-9]+&#[0-9]+;\}\],\s&#[0-9]+;[a-zA-Z]+&#[0-9]+;:\s&#[0-9]+;[a-z]+&#[0-9]+)", RegexOptions.Singleline);
                foreach (Match elfmatch in fullregex)
                {
                    String elftestparse = elfmatch.Groups[1].Value;
    
                    MessageBox.Show(elftestparse.ToString());
    
                  
                }

    thats the code i currently have and the regex but it just skips right past and dont go in yet it shows fine in notepad++ but c# seems to have troble with it i also tryied htmlaglility pack and got same of of problem but the string is all one lone so i not sure why i cant get the values i need but hopefully someone can shed some light on it for me 

    thank you in advance elfenliedtopfan5

    Monday, May 20, 2019 3:25 PM

All replies

  • What are you trying to do? That is not valid JSON.

    william xifaras

    Monday, May 20, 2019 3:45 PM
  • What are you trying to do? That is not valid JSON.

    william xifaras

    im trying to parse the name and get the value like Attachrail01_metallic.jpg

    and then do a check if the width and hight is = to 2048 x 2048 or 4095 x 4095 

    if thats correct and they match look for the link that coresponding so look down from where it found the hight look to right to it finds a link 

    one of the links

    and once it has that download the image and give it the name value we found earlier ( there are like 20 occorances of these kind of links and i want to loop though find ones with the right sizes and download them to computer with the given name 

    but not sure how to parse i tryed regex but its not working well for me 

    Monday, May 20, 2019 4:59 PM
  • Ok, but thats not valid JSON.

    You should use Html Agility Pack to parse the DOM.

    https://html-agility-pack.net/


    william xifaras

    Monday, May 20, 2019 5:02 PM
  • Ok, but thats not valid JSON.

    You should use Html Agility Pack to parse the DOM.

    https://html-agility-pack.net/


    william xifaras

    yeah i have this downloaded but im not sure what it is i have to do and if its not json or html then i dont know how to get this to work well and what do you mean to parse the DOM
    Monday, May 20, 2019 5:15 PM
  • The library will do the parsing for you. You need to look at the documentation.

    Main docs

    https://html-agility-pack.net/documentation

    Parse from Web docs

    https://html-agility-pack.net/from-web


    william xifaras

    Monday, May 20, 2019 5:18 PM
  • The library will do the parsing for you. You need to look at the documentation.

    Main docs

    https://html-agility-pack.net/documentation

    Parse from Web docs

    https://html-agility-pack.net/from-web


    william xifaras

    i been looking at it and im trying my best to get the load stuff i need but it wont find go inside the div i need it to 

                var url = "https://sketchfab.com/3d-models/steyr-aug-a3-4cea993b9f0d47c6b1beed7877b17447";
    
                var httpClient = new HttpClient();
                var html = await httpClient.GetStringAsync(url);
    
                var htmlDocument = new HtmlAgilityPack.HtmlDocument();
                htmlDocument.LoadHtml(html);
    
                var sfdiv = htmlDocument.DocumentNode.Descendants("div class=\"dom - data - container\"")
                    .Where(node => node.GetAttributeValue("id", "")
                    .Equals("js-dom-data-prefetched-data")).ToList();
    
    
                var sfitems = sfdiv[0].Descendants("div class=\"dom - data - container\"")
                    .Where(node => node.GetAttributeValue("id", "")
                    .Contains("dist")).ToList();
                MessageBox.Show(sfdiv.ToString());

    but still cant seem to get it to parse certin things the one above works little it rips certin things off the site but not the big chunk i need 

    Monday, May 20, 2019 8:08 PM
  • See if you can use Selectors

    https://html-agility-pack.net/selectors

    That library does what its supposed to do.


    william xifaras

    Monday, May 20, 2019 8:24 PM
  • See if you can use Selectors

    https://html-agility-pack.net/selectors

    That library does what its supposed to do.


    william xifaras

    i really want to do that but i cant get the xpath of the divclass it dont show up plus the stuff i need not in the main html on the site only shows when html is loaded in c# for some strange reason 

    so not sure how to use it without xpath
    Monday, May 20, 2019 9:08 PM
  • Hi 

    Thank you for posting here.

    According to your description, you want to get the json from html page.

    Based on my test, I could not get the json data by using winform's Webbrowser control.

    I think the reason is that the json string has been annotated.

    You could look at the following picture, the key symbol “<!--” means that the json has been annotated.

    I used the following code.

     private void Form1_Load(object sender, EventArgs e)
            {
                webBrowser1.Url = new Uri("https://sketchfab.com/3d-models/steyr-aug-a3-4cea993b9f0d47c6b1beed7877b17447");
    
            }
    
            private void Button1_Click(object sender, EventArgs e)
            {
                HtmlDocument document = webBrowser1.Document;
                var t = document.GetElementsByTagName("div");
                int count = 1;
                foreach (HtmlElement item in t)
                {
                    if(count==1)
                    {
                        string name = item.OuterHtml;
                        MessageBox.Show(name);
                    }
                    count++;
                }
               
            }

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, May 21, 2019 2:49 AM
    Moderator
  • Hi 

    Thank you for posting here.

    According to your description, you want to get the json from html page.

    Based on my test, I could not get the json data by using winform's Webbrowser control.

    I think the reason is that the json string has been annotated.

    You could look at the following picture, the key symbol “<!--” means that the json has been annotated.

    I used the following code.

     private void Form1_Load(object sender, EventArgs e)
            {
                webBrowser1.Url = new Uri("https://sketchfab.com/3d-models/steyr-aug-a3-4cea993b9f0d47c6b1beed7877b17447");
    
            }
    
            private void Button1_Click(object sender, EventArgs e)
            {
                HtmlDocument document = webBrowser1.Document;
                var t = document.GetElementsByTagName("div");
                int count = 1;
                foreach (HtmlElement item in t)
                {
                    if(count==1)
                    {
                        string name = item.OuterHtml;
                        MessageBox.Show(name);
                    }
                    count++;
                }
               
            }

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    that is amazing thank you ( only one other thing now i confused with is how can i acess a certin string like say i need to find the value of name property and display it is that possible sorry not worked with html very much before so parsing this is quite difficault but hopefully its possible like part of the string in there will have 

    &#34;name&#34;: &#34;AUGA3_Substance2_LP_Base_1_Normal.png&#34;,

    then you have witch and hight in the same format as above then finnaly same format with the url 

    trying my best to rip images form this depeding on the size and want to give them a name on export witch i know can be done with webcliant()

    just not sure how to store and save values 

    Tuesday, May 21, 2019 2:36 PM
  • Hi 

    Thanks for the feedback.

    >> how can i acess a certin string like say i need to find the value of name property and display

    I am sure what your mean is. If you want to get the value of html name property, you could use HtmlDocument.GetElementsByTagName.

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Wednesday, May 22, 2019 6:27 AM
    Moderator
  • Hi 

    Thanks for the feedback.

    >> how can i acess a certin string like say i need to find the value of name property and display

    I am sure what your mean is. If you want to get the value of html name property, you could use HtmlDocument.GetElementsByTagName.

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    the issue im having is this 

    the div is always empty when i put in a name but the full div path to access this line i want is this 

    <div class="dom-data-container" style="display:none;" id="js-dom-data-prefetched-data">

    thats the issue im having i think the div class might be going into the wrong one is there a way to make it go into this one though fine elm name 

    Friday, May 24, 2019 5:28 PM
  • Hi

    I noticed that you are try to achieving the json data. I have said it that we could not achieve the data that has been commented out by " “<!--"". If you want to use GetAttribute  method, your parameter should be class, style or Id instead of "&#34".

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, May 28, 2019 7:50 AM
    Moderator