Summer of Code: May 2007

Monday, May 28, 2007

Thoughts after ICSE

I've just returned from ICSE May 18th-May 26th. I had the chance to attend several workshops and talks and meet with several people.

One interesting thing discussed during the Mining Software Repository workshop was several ways of correlating defect density with software modules. One simple metric with high correlation was importing a package or namespace. In Eclipse, a class importing the 'compiler' package has a 71% chance of containing a bug. This is likely due to some concepts being more error-prone and show up when importing the package.

One discussion I had with Jon Cook was the difference between run-time usage frequency versus static frequency. We concluded that CodeRank or (some other static analysis) would potentially try to approximate what the distribution of run-time frequency would be against the static call sites. Empirical evaluation would be needed to confirm how well something like CodeRank approximates run-time distributions of usage. If this turns out to not work well, then a backup plan would be to use instruction coverage or runtime profiles from programs such as gcov.

I will be talking with Alex Orso this week or next for his opinion.

I also had some of my own thoughts about how to visualize the eventual output.
This is an interesting visualization problem because a framework developer may want to see the distribution of types/methods across several deployed applications.

Friday, May 18, 2007

ICSE and Plan

I will be leaving today to attend the International Conference on Software Engineering. My summer of code project will be one of the things on the agenda that I hope to get some feedback on.

That said, I want to post within the next few days a more thorough development plan that includes more concrete milestones and deliverables.

Monday, May 14, 2007

System Calls

Found a simple way to isolate system calls versus non-framework calls.

In MSIL, a call instruction includes the fully qualified name. Therefore, using Cecil you can do the following:


if (i.OpCode == Mono.Cecil.Cil.OpCodes.Call)
{
    MethodReference rf = (MethodReference)i.Operand;
    if (rf.DeclaringType.Namespace.StartsWith("System"))
    {
            systemCount++;
    }
    else
    {
            nonSystemCount++;
    }
}

Currently considering what architecture is best for storing type/method call frequency which would be more complicated than simple system-calls frequency.

Sunday, May 13, 2007

Mono.Cecil

Downloaded and played around with Mono.Cecil. The framework seems very similar with instrumentor code I've written using Rails.NET. Wondering if I should try out a mini-project to get my hands more dirty with Cecil.

Wednesday, May 9, 2007

Day 1

Establishing blog for the purpose of tracking progress for my google summer of code project.

This is the text of my abstract:

Code coverage, typically referring to statement coverage, is a technique for estimating the adequacy of test cases. Achieving 100% code coverage is very difficult if not impossible. Prioritizing the development of new test cases to cover untested code requires an understanding of which criteria is most important. One criterion is to test code that is more likely to generate faults. Another criterion is to test code that is more frequently used from which it follows that code is more important and needs to be reliable.

An approach for favoring the important but uncovered code involves an algorithm that is capable of ranking the importance of code. CodeRank is a technique that is similar in spirit with Google’s PageRank –- important methods link to other important methods. The CodeRank creates an ordered ranking of all the methods where each rating assigned to a method gives its relative percent importance. This rating can be scaled by other factors including call frequency.

The delivered outcome will be a modified version of MonoCov that presents the code coverage criteria, but includes the option to rank the output by CodeRank. Future work would allow visualizing prioritized code coverage in a treemap representation.

Summer of Code