none
A few simple questions. RRS feed

  • Question

  • EDIT: I just realized this could be in the wrong section of the forum. And I also just saw the forum, where I could ask where I should post this question. My sincere apologizes.

    Hello everyone,

    Recently I decided to learn a bit C# for fun. Only my problem with practicing is, is the fact that I never know what programs to make. So by brother said, "why don't you make a web spider?".

    I thought, yeah that's a pretty good idea, that way I learn how to connect to the Internet, using databases and regular expressions. It doesn't have to be a fast one, because I'm not planning on spidering the whole Internet, but i tried to make it decent.

    My knowledge is very basic, this is the first program I ever made, so don't expect any high-leveled coding. I do want to know exactly what I'm doing. So I also tried not to rip code just from the Internet and put it in my application, but understand the code.

    Anyways, I made a simple database using Microsoft Access. The file name is database.mdb. It contains the databases:

    -EmailAdresses
    -URLS

    EmailAdresses has the columns:
    -id
    -url
    -dateAdded
    -parentPage

    URLS has the columns:
    -id
    -emailAdress
    -dateAdded
    -parentPage

    How does my spider work? It reads the URLS database for a URL to spider. Then I downloads the source code, and runs some of my regexes and other string manipulation functions I wrote. So it gets all the emailadresses and URLS from that page.
    Then I want the URLS to be stored in the EmailAdresses database and the URLS in the URLS database.

    This creates a somewhat infinate loop, because it keeps on adding URLS from the URLS database, and keeps on reading them. Atleast that is the idea. Only, I don't know how to quite do it with the databases. I got this code from the Internet:

    Code Snippet

    OleDbConnection m_cnADONetConnection = new OleDbConnection();
    m_cnADONetConnection.ConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\Anansi\database.mdb";
    Console.WriteLine(" -Connection made");
    m_cnADONetConnection.Open();
    Console.WriteLine(" -Connection opened");

    OleDbDataAdapter m_daDataAdapter = new OleDbDataAdapter();
    m_daDataAdapter = new OleDbDataAdapter("Select * From urls", m_cnADONetConnection);
    OleDbCommandBuilder m_cbCommandBuilder = new OleDbCommandBuilder(m_daDataAdapter);
    DataTable m_dtContacts = new DataTable();
    m_daDataAdapter.Fill(m_dtContacts);



    I don't understand this. Could anyone care to explain to me what exactly is a databuilder and a dataadapter. And also, can I use the same connection to connect to 2 different databases, and use 2 databuilders to add rows to those databases?

    Thanks for your time. And sorry for my spelling mistakes. I know I have to work on my English skills.

    ~Timo Willemsen


    Wednesday, June 13, 2007 11:54 AM

All replies

  • In .NET there are two standard ways to interact with a database: data readers and data sets.  Each has its advantages and therefore you should evaluate both options. 

     

    Data readers are the fastest but require that you store the data locally yourself using business objects or something.  Data readers also require a connection to the database until all the data has been read.  Finally data readers allow forward, read-only access to the data.  If you want to change the data then you can't use a data reader.  My general recommendation is to use a data reader if you are going to store the data into business objects otherwise use a data set.

     

    Data sets use data readers behind the scenes but a data set is effectively a copy of the table(s) of the database in local memory.  The data is retrieved, the table and column structure is built up and the data is stored.  Constraints and references are enforced in the data set.  Data sets take up a lot of memory so you don't want to use them for holding large sets of data but they do allow you to make changes to the data and easily flush it back to the database.

     

    To connect to a database you will use a connection object such as OleDbConnection or whatever the most appropriate connection class is for your provider.  This connection is limited to a single database.  It implements IDisposable so you should wrap the variable in a using statement to ensure the connection gets cleaned up properly.  Once you create the connection you need to open it using Open.  This makes the actual connection to the DB.

     

    Now you can run commands against the DB.  To run a command you will normally create a command object such as OleDbCommand.  The command can be either a stored procedure or raw SQL.  You can also set up any parameter needed. 

     

    Code Snippet

    using(SqlConnection conn = new SqlConnection("Data Source=(local);Initial Catalog=MyCatalog;Integrated Security=true"))

    {

       //Create the command

       SqlCommand cmd = new SqlCommand(conn);

       cmd.CommandText = "SomeSprocName";

       cmd.CommandType = CommandType.StoredProcedure;

     

       //Open DB

       conn.Open();

     

       //Run command

       ...

    };

      

    Now that you've got the command you need to execute it.  What you're executing determines how you do it.  If you want to insert or update data then you generally don't care about the results so you will use ExecuteNonQuery which simply runs the command.  If the command returns a single value (such as a result code) then use ExecuteScalar instead.

     

    Code Snippet

    using (...)

    {

       ...

     

       //Update data

       cmd.ExecuteNonQuery();

    };

     

    If you wan to retrieve rows of data then you will either use the data reader or the data set.  For a data reader you will use ExecuteReader to get back a data reader.  You then enumerate the rows until there is no more data.  For each row you need to store the data somewhere.  Also with a data reader no type-checking is done so you need to know both the column name and its type to retrieve the data.  As with a command a reader implements IDisposable.

     

    Code Snippet

    using (...)

    {

       using(SqlDataReader dr = cmd.ExecuteReader())

       {

          if (dr != null)

          {

             while (dr.Read())

             {

                int id = Convert.ToInt32(dr["id"]);

                string name = Convert.ToString(dr["name"]);

                ...

             };

          };

       };

    };

     

     

    Data sets require the same amount of code but work a little different.  You must create a data adapter and associate the command(s) with it in order to populate a data set.  You only need to set the commands you're actually going to use.  The nice thing about a data set is that it can maintain multiple tables of data simultaneously so references are intact.

     

    Code Snippet

    using(...)

    {

       ...

     

       SqlDataAdapter da = new SqlDataAdapter();

       da.SelectCommand = cmd;

       da.InsertCommand = ...;

       da.UpdateCommand = ...;

       da.DeleteCommand = ...;

     

       DataSet ds = new DataSet();

       da.Fill(ds);

    };

     

     

    A few words of caution.  Be sure to use using with the appropriate data objects otherwise you'll leak connections and potentially lock your database.  Data reader requires a lot of code so it is useful to isolate this into a single function.  A connection can have only a single reader open (in almost all cases) on it at a time (remember that a data set use a reader) therefore you should not share connections across threads.  Unlike previous frameworks it is best to open a connection, do your work and close the connection rather than hold the connection open for a long time.  The ADO.NET library is optimized for this behavior and so it improves scalability without impacting performance.

     

    I'll make a plug for my class library that can reduce the above code (excluding the copying of the data to a business object) to a single line of code in the best case.   

    Code Snippet

    //Business class

    public class Employee

    {

       ...

    }

     

    //During app init

    ConnectionManager.Initialize(new SqlConnectionManager("..."));

     

    //Later

    Employee[] employees = ConnectionManager.QueryResults("GetEmployees",

          CommandType.StoredProcedure, ParseEmployee, null);

     

    Refer to my web site for more info if you're interested.  You'll probably end up having to create an Access provider but it is pretty straightforward.

     

     Michael Taylor - 6/13/07

    http://p3net.mvps.org

     

    Wednesday, June 13, 2007 12:43 PM
  • Thanks a lot for your help you made me understand it =)

          static void InsertRowURL(String URL, String dateAdded, String parentPage)
            {
                String myConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\Anansi\database.mdb";
               
                OleDbConnection myConnection = new OleDbConnection(myConnectionString);
                string myInsertQuery = "INSERT INTO URLS (url, dateAdded,  parentPage) Values('" + URL + "', '" + dateAdded + "', '" + parentPage + "');
                OleDbCommand myCommand = new OleDbCommand(myInsertQuery);
                myCommand.Connection = myConnection;
                myConnection.Open();
                myCommand.ExecuteNonQuery();
                myCommand.Connection.Close();
            }
    Code Snippet

    This is the function I use now to add to the database, yeah I know it's quite inefficient and it's from the MSDN library.
    But how can I make it so, that it won't add duplicates...

    Anyone have an idea?

    Wednesday, June 13, 2007 4:16 PM
  • A simple way to do this in SQL:

    IF NOT EXISTS (SELECT url FROM urls WHERE url='xxx')
       INSERT INTO URLS ( url, ... ) VALUES ( 'xxx', ...)

    Wednesday, June 13, 2007 5:19 PM
  • Thanks for your help Robbert

    Only, when I try to do that in my function, it's not possible, because it expects something like "INSERT" "UPDATE" etc as the first word of the sql query

              
    Code Snippet

      String myConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\Anansi\database.mdb";
              
                OleDbConnection myConnection = new OleDbConnection(myConnectionString);
                string myInsertQuery = "IF NOT EXISTS (SELECT url FROM urls WHERE url='" + URL + "') INSERT INTO URLS ( url, dateAdded, parentPage ) VALUES ('" + URL + "', '" + dateAdded + "', '" + parentPage + "')";
                OleDbCommand myCommand = new OleDbCommand(myInsertQuery);
                myCommand.Connection = myConnection;
                myConnection.Open();
                myCommand.ExecuteNonQuery();
                myCommand.Connection.Close();


    Wednesday, June 13, 2007 5:55 PM
  • The command should be valid.  Put a BP on the command after you set the query string and then look at the string.  Verify it is formatted properly.  I'd wager it isn't.  If it looks right then try running in the database directly.  Ultimately you should consider using a sproc if your DB supports it as you can do a lot better validation.  You can also set up unique keys in the DB schema to enforce the uniqueness rather than doing it yourself.

     

    Finally be sure to wrap the OleDbConnection stuff in a using statement.  You don't need to call Close then and even if an exception occurs the connection will still be cleaned up (which won't happen right now).

     

    Michael Taylor - 6/14/07

    http://p3net.mvps.org

     

    Thursday, June 14, 2007 12:48 PM
  • Also, you will make your life much easier if you formulate your queries using String.Format instead of via concatenation, e.g.

       string sql = String.Format( "IF NOT EXISTS (SELECT url FROM urls WHERE url='{0}') INSERT INTO urls ( url, dateAdded, parentPage ) VALUES ('{0}', '{1}', '{2}')", url, dateAdded, parentPage);

    Not only is this easier to read, and much easier to spot mismatched quotes in, it also produces a local variable that you can watch so that you can see that the query is properly formulated before you set the OleDbCommand. 

    In fact, if I'm formulating SQL in my app, often I'll just dump the SQL to the console while debugging, copy/paste it into Management Studio, and run it to check its validity and the kinds of results it's returning before I bother with the whole stack of objects between my code and the database.
    Saturday, June 16, 2007 8:12 AM