none
School Project, an anti-cheat code comparer :) RRS feed

  • Question

  • Because i am currently working with moss 2007, and i have a final project, i was thinking in using WSS 3.0 for creating this application.
    I need a software that can detect possible frauds in school projects, and it should support java, C and C++.
    The application should make code analysis and compare it with other code already submitted and stored.

    My main tasks now are requirements, architecture and design.

    I wonder if anyone one can point me out good resources, ideas, suggestions and best practices for this kind of project

    I was thinking in a web front end, so every teacher could access it from everywhere, so why not using WSS 3.0? It has alot of stuff that would look nice like lists(of projects, etc)

    The projects are usually submitted in a zip file, so adding a zip from a group of students in a list, and then it would start analogizing the code, opening the zip, checking directory structure, and then comparing files, and then opening code files and comparing the structure. Here i was thinking in some sort of xml like

    <start>
    <variable declaration type="string" name="aux" value="10" />
    <loop type="for" var="i" until="aux" pass="i++">
    <otherReserved type="Console.Write" value="i">
    </loop>
    </start>

    This way even if they changed the name of variables, etc, the algoritm is pretty much the same.

    If anyone understand any of this, and has ideas please tell me Smile


    Thursday, March 6, 2008 4:32 PM

Answers

  • My thought on that is that it's simple to do a search and replace on names of variables, but it's a lot harder to change the structure of the code, if the person doing it doesn't fully understand that code.  If they did understand the code, why bothering to copy in the first place?

     

    As I previously mentioned, to be able to guarantee that someone copied code isn't very practical in my opinion.

     

    You could look at cyclometric complexity.

     

    I think anyone trying to copy code would generally have the intelligence to change the variable names, and comments to some extent, and that is why I didn't suggest looking there. 

     

    Perhaps if you looked at the IL, the complexity, the variables and formatting, the bugs (they're not something you'd copy on purpose I wouldn't think?) you could perhaps form an opinion of how close one piece of code is to the other.

     

    A high score of similarly would lead to a conclusion that it's highly possible the two source codes started off the same, but I wouldn't say that it would guarantee it?

     

    I hope this helps,

     

    Martin Platt.

    Tuesday, March 11, 2008 3:19 AM

All replies

  • I'm sure there is a good amount of research out there to do this properly however I thought I'd spend 10secs thinking about and post Wink

    My first thought was to consider using IL. E.g. if you conviently ignore non-.net languages you could reflect the code first, that would remove a great deal of code noise. On a similiar not you might able to utilise the CodeDom/Visual Studio Extenstions too. But in general I think MOSS sounds like a good choice and the idea of parsing it into XML seem good too.
    Thursday, March 6, 2008 9:17 PM
  • Pedro,

     

    Although I'm positive this would be possible, I don't know how plausible it would be.  What I mean is, you're checking to see if two submitters of code are the same, possibly different variable names, but essentially the same.  You could look into the IL that is generated, that would work.

     

    What I'd consider is, you're comparing public domain information, so how likely is it that two seperate people could have identical, or incredibly similar code?

     

    Put another way, if two people use software patterns, the same naming guidelines, the same architectural guidance, how sure could you be that one is the copy of the other?  At best it would only be an indicator as to the level of differences only?

     

    Legally speaking, from what I understand, it is incredibly difficult to prove that one software application was copied from another, in terms of a copy of IP, so I'd extend that to this situation, and say that this would also be difficult.

     

    That said, you could look at cyclometric complexity as well as IL, and if they're identical in every way, you could say that they are identical, but unless there's a good reason to suspect that they're copies, I don't think you could make that assertion.

     

    Hope this helps,

     

    Martin Platt.

    Thursday, March 6, 2008 11:19 PM
  • The experience of the teacher is, that rarelly or never someone comes up with the same solution, the exact same way, and i tend to agree with him.

    Only at a more professional level we start to see similar code.
    Thursday, March 6, 2008 11:52 PM
  • To an extent, I'd agree, but if you're a programmer that knew nothing and were taught by the same teacher in the same way, I'd still say that that was possible to get the same, or very similar code, from two equally skilled students?

     

    That aside, as I said, I'd be looking at the IL, which is like the boiled down version of the original source code, so a lot of the work would already be done for you with the IL.

     

    Looking at cyclometric complexity, that will tell you how complicated a soultion is.  I would tend to agree that if both are very complex solutions, and both look the same, then copying has probably then occurred.  If they're not complex, then they could both be good programmers who came to the same conclusions?

     

    I really do hope this helps,

     

    Martin Platt.

    Friday, March 7, 2008 12:04 AM
  •  

    Thats a very interesting point. By examining the IL you've lost the developer detail and are left with very raw structures which, given the students are solving the same problem, may well look similiar. Perhaps the better idea is to look at the other end, e.g. look at the spacing, the function\variable naming, i.e. the bits that would be most likely to be different if the student worked on it themselves.

     

    Friday, March 7, 2008 9:54 AM
  • My thought on that is that it's simple to do a search and replace on names of variables, but it's a lot harder to change the structure of the code, if the person doing it doesn't fully understand that code.  If they did understand the code, why bothering to copy in the first place?

     

    As I previously mentioned, to be able to guarantee that someone copied code isn't very practical in my opinion.

     

    You could look at cyclometric complexity.

     

    I think anyone trying to copy code would generally have the intelligence to change the variable names, and comments to some extent, and that is why I didn't suggest looking there. 

     

    Perhaps if you looked at the IL, the complexity, the variables and formatting, the bugs (they're not something you'd copy on purpose I wouldn't think?) you could perhaps form an opinion of how close one piece of code is to the other.

     

    A high score of similarly would lead to a conclusion that it's highly possible the two source codes started off the same, but I wouldn't say that it would guarantee it?

     

    I hope this helps,

     

    Martin Platt.

    Tuesday, March 11, 2008 3:19 AM
  • tks alot, this has been very helpfull, now i will investigate this better Smile

    Tuesday, March 11, 2008 4:17 PM