none
Project Design Concept RRS feed

  • General discussion

  • I have a project in mind which I am looking for some advice on what programs/languages/platforms etc. would be the best way to tackle it. I know there will be lots of different ways to achieve this, but I'm looking for some ideas that would be best suited to my skill set, and take least amount of time to learn.

    Project: I want to build a website scraper to retrieve all the Fund data from a popular investment company website in the UK. I want this process to run once a week, and be fully scheduled on Azure so I don't have to do it myself, or have my PC on. I want this process to upload the data to a Azure SQL Database, so that I can access the data from Power BI from a few different machines like my laptop while away and desktop when at home.

    Tools: I have Microsoft Azure at my disposal with a modest monthly amount available, maybe £10 but can be flexible. 

    Skills: I have a basic understanding of programing, and can competently code a few things in VBA, but by no means advanced. I also have some previous experience and basic understanding of setting up SQL databases, but I do mean basic.

    Advice: What I am looking for is an idea on the best way for me to approach this project given my skill set. I am happy to learn a new language as I'm pretty certain that's required. And I am happy to take a long time over it learning as I go. What I'd like to do is make sure I start off in the right direction, and learn the best languages for the job. Or ones that are similar to what I already know.

    Current Ideas: My thoughts so far, is that I should learn and design the webscraper in Python, learn somehow how to make that run automatically through Azure on a weekly basis. Have the Python code write to an Azure SQL Database, that has access for me to design a Power BI report that access it. I have written a program in excel/VBA that scrapes the website already, but I doubt that's the best way to make things work on Cloud.

    Any help greatly appreciated.

    Thanks

    Wednesday, June 28, 2017 3:53 PM

All replies

  • In a nutshell, I'd probably look to use Azure Automation to run a PowerShell script to run the screen scraping and writes to the database. I'd probably do the screen scrape using PowerShell too just because I'd already be in that language in Azure Automation. That's going to be the best way within Azure to automate the task on a schedule and everything will run with Azure that way, no local machine or anything else.
    Wednesday, June 28, 2017 5:58 PM
  • Hello,

    You may consider the following service. It has a schedule feature.

    http://www.mozenda.com/publish-data-microsoft-azure/


    They put the data on Azure BLOB storage accounts as CSV/TXT and from there you can schedule upload to SQL Azure. You can use Azure Automation or Azure Data Factory to upload the CSV files.

    https://social.msdn.microsoft.com/Forums/azure/en-US/b5cb2acf-e659-422e-b0b1-8b1991402e8d/loading-multiple-files-from-azure-blob-storage-into-azure-sql-database?forum=ssdsgetstarted


    Hope this helps.



    Regards,

    Alberto Morillo
    SQLCoffee.com


    Wednesday, June 28, 2017 7:12 PM
  • Thanks Grant, that's a great help. It gives me a direction I need to start looking. I'm sure I'll get stuck along the way, but at least I have a starting point :)

    Wednesday, June 28, 2017 7:42 PM
  • Thanks Alberto.

    I had a look at the mozenda site, but their minimum monthly fee was $99 which is more than I'm prepared to pay for this project. If my project was going to deliver me something where I feel I would get that kind of return I would give it ago, but this is more of a hobby project.

    Many thanks for taking the time to answer for me.

    Wednesday, June 28, 2017 7:45 PM
  • One very nice thing about Azure Automation is that there are a bunch of examples on the web site. You may find something similar to what you need and you'll just have to modify it.
    Wednesday, June 28, 2017 8:21 PM
  • Consider the Azure Functions service for hosting your code.

    Super-simple, and you can choose from multiple languages.

    For .NET languages, check out the HTML Agility Pack for interacting with web pages. 

    https://www.nuget.org/packages/HtmlAgilityPack

    David


    Microsoft Technology Center - Dallas
    My blog

    Wednesday, June 28, 2017 8:57 PM
  • The good news is the project is practical, and I think everyone who has been in the field for ten years has written one like that at least twice (or should have, lol).  

    The bad news is that it does require a "stack" of new "skills" beyond modest VBA, especially to fit it onto Azure.

    If you'd just finished one, then had to walk across the street and do it again with minor changes, it might take a week.  But to do it the first time when you have to master several tools, is going to take ... longer.

    If you have the time you will feel good about it afterwards, but just wanted to share this perspective.

    Can it be hosted for £10 per month?  I'd have to look, I've been working rather north of that.  It's an interesting concept if it works, to create little micro-services and host them each for £10.

    Good luck.

    Josh

    Thursday, June 29, 2017 10:19 PM
  • Thanks very much all for your help.

    Having reviewed it a bit, I think I might stick with the original idea of going with python to scrape the website, as it seems a lot of people have done it that way before and should therefore have most amount of material online to help me though.

    I will then need to work out some way to link that with the azure automation which is probably where I'm going to get most stuck.... but I'll cross that when I get to it!

    I'll post back progress over the coming weeks... or most likely months for those interested :)

    Friday, June 30, 2017 12:13 PM
  • I think Prisync is a cheaper alternative for such project. Check out their pricing page and API if that works for you.

    https://prisync.com
    Wednesday, December 4, 2019 1:33 PM