Sunday, August 26, 2007

Summary

Project Description

FieldStat is a tool for scanning a collection of application binaries in order to understand how a particular set of classes and methods are being used in practice.

One main purpose of the tool is to support prioritizing unit test coverage based on API usage data observed in practice. Developers can view how often and how many applications use certain types and method calls in relation to unit test coverage and difficulty of testing.
This shares similar goals with clean room software engineering in that a reliability model is based on statistical likelihood of occurring. That is, place more testing effort in more likely occurring scenarios.

The tool relies on the Mono.Cecil assembly to process CIL byte-code instructions.

Delivered

Planned
  • Get up to speed with Cecil.
  • Extract call graph.
  • Identify and isolate mono framework calls.
  • Implement CodeRank algorithm.
  • Integrate with MonoCov.
  • Test and Refine application.
  • Documentation: User manual, design docs, touch-up comments
Extra
  • Command Line and Graphics Interface
  • Export Data
  • Include other statistics such as application usage count, method complexity.
  • Plugin Architecture
  • Plugin for finding Design Fragments (Set of reoccurring system calls)
  • (External) Improve Mono.Cecil tool to have better support for reconstructing statements and message chain bad code detector.

Components

Application Repository

Understanding how API calls are used in practice requires sampling actual software. Unfortunately, access to business software is limited; however, many open source applications offer a good starting point.

Over 500 projects were downloaded from the code.google.com project. The projects were selected based on the label: Mono or CSharp. In addition, 2 projects from a company were included. The projects were manually built, or a binary distribution was acquired. Some projects had to be excluded due to immaturity(not building), misclassification, and lacking the appropriate resources to build the project.

Coverage Data

The monocov tool gathers statement coverage information from the run-time execution of a Mono application. The statement coverage can be gathered in the following manner.

In the mono distribution mcs/class/corlib directory:
> make run-test RUNTIME_FLAGS="--profile=monocov:outfile=corlib.cov,+[mscorlib]"
> monocov --export-xml=/tmp/corlib-cov corlib.cov

However, the generated xml file was intended for presentation, not importing. Instead,
a new export option is introduced.

> monocov --export-fieldstat=/tmp/corlib-cov corlib.cov

Scanners and Analysis

FieldStat uses a visitor/collector pattern for gathering statistics. A visitor class walks the assemblies, classes, and methods. To gather statistics, a collector class is registered with the visitor and is notified when a particular a node of interest is visited. For instance, a collector can be notified whenever a system call is encountered.

Some default collectors included in FieldStat.
  • AppStat - Simply counts the number of system calls used per application.
  • CodeRank - Build's an application's call tree and calculates the associated code rank of each application method.
  • TypeCount - Counts the number of times a system type and system method is statically called in an application.

Running the Tool

The simpliest way run FieldStat is to specify the directory containing the coverage information (the -export-fieldstat output from monocov).

> FieldStat --coverage-path="../../../Data/Coverage Data/coverage_output" FieldStat.exe

This would output the Results.xml files in the output/ directory.

The file can be read with a XML parser or with the DataTable.ReadFile( file ) call.

A record is as follows:

<results>
<type>System.IO.Path</type>
<method>GetFileNameWithoutExtension (string)</method>
<length>1</length>
<frequency>2</frequency>
<rankedfrequency>0.30</rankedfrequency>
<appfrequency>1</appfrequency>
<coverage>1</coverage>
</results>

Screenshots

Plugins and Future Directions

Plugin System

Plugins can be created by dropping in a file named *Plugin.dll into the Plugins/ directory.

Design Fragments

An example plugin is included in the source code as the DesignFragmentPlugin project.

A design fragment is a pattern or common usage of a framework. This plugin attempts to detect candidates for design fragments by looking at common sequences of calls. This then in turn could be used to improve the framework, or serve as documentation or snippets in how to use the framework.

For example, the following series of calls was found to occur 10 times in the application repository. It looks like the user is trying to format a date in a particular way:

System.DateTime.get_Day ()
System.Int32.ToString ()
System.DateTime.get_Month ()
System.DateTime.get_Year ()
System.Int32.ToString ()
System.String.get_Length ()
System.String.Concat (string,string)
System.String.get_Length ()
System.String.Substring (int,int)
System.String.Concat (System.String[])

The plugin is written as follows:

using FieldStat.DataCollection;
using FieldStat.CodeModel;

public class DesignFragmentPlugin : AbstractPlugin
{
public override void ComputeResults(Results results, ICollection files, Hashtable htBin)
{
Visit scan = new Visit();
scan.Collectors.Register("DesignFragment", new DesignFragmentCollector());

scan.DoScan( files, htBin);
MyCollector seqs = (MyCollector)scan.Collectors["DesignFragment"];
// Process Results ...
}
}

public class DesignFragmentCollector : AbstractCollector
{
public override void OnMethodBody(MethodBody body)
{
ArrayList seq = GetSystemCallSequences(body);
if (seq.Count > 3)
{
sequences.Add(EncodeSequenceCalls( seq ));
}
}
....
}

SystemSignature

This plugin extracts the system calls made in an application. Then the resulting system call signature is compared against the other application's signatures. The results can be clustered to find common type of applications. It may also be used to judge the implementation of a well-known type of application. For example, many irc clients have been written. How divergent is yours from the other known clients? Maybe you should have taken advantage of a different architecture that you were not aware of. (Plugin under development)

Monday, August 20, 2007

Done. (Sorta)

Yesterday I finished implementing all the features that I had planned (and more).

Some of those features:

* New CodeRank metric where the use of a system call is weight by that function's code rank.

* Provide a non-graphics interface to using tool.

* Provide the ability to write custom plugins for collecting data and creating reports from unit test data and example application repository.

* Enhanced the grid view to visualize coverage information.












I also have a couple more ideas for plugins and how to use the data, but first I want to check in my code and make sure there is sufficient documentation.

Friday, August 3, 2007

Introducing PopTrees (Detecting Message Chains)

I introduce a PopTree as a simple abstraction over CIL and demonstrate how they can be used to find Message Chains.

Detecting Message Chains.

Message chains violate the Law of Dementor[1] by accessing a long series of other object's Methods and Properties. Detecting Message Chains violations in CIL involves understanding the local structure of statements. A large subset of instructions can occur in the context of a passing values to a method call. This means understanding the structure of message chains entails understanding much of the CIL instructions used in passing arguments. In addition, the instructions have to be understood to tell the difference between a sequence of method calls and a chain of method calls.

The problem is that there isn't good support for representing the instructions as more complex statements. Usually this is done ad hoc for the purpose of a person's analysis. Some code in Cecil.FlowAnalysis.ActionFlow has attempted to provide some abstractions over CIL; however, it only implemented a small subset of the instructions. In addition, the design tried prematurely canonicalizing the instruction tree making it difficult to extend to the full instruction set.

Instead, I introduce the concept of a PopTree. A PopTree roughly corresponds to a statement in the original source code. A PopTree is constructed by simulating the instruction stream to push and pop various values on a state machine. An instruction 'owns' another instruction if it pops it's value off the stack. A post-order transversal should closely correspond to the original AST. The hardest part was examining the spec to see how many pops and pushes an given instruction causes.

Algorithm:

public PopTreeList Build( ICollection instructions )
{
foreach (Instruction i in instructions)
{
PopTree p = new PopTree(i);
int pops = InstructionInfo.Pops(i,_machineStack.Count);
if (pops > _machineStack.Count)
{
// can happen with exception blocks whose support is flaky.
//Console.WriteLine("Attempting to pop more than machine has: " + i.OpCode.ToString() + " " + i.Operand);
return null;
}
for (int x = 0; x <>
{
p.Consume(Pop());
}
if (InstructionInfo.Pushes(i))
{
Push(p);
}
else
{
_popTreeList.Add(p);
}
}
return _popTreeList;
}

CODE:
m1( a1() ).m2( a2() ).m3( a3() )

CIL:
ld this ; push [this]
dup ; push [this]
call a1 ; pop [this] -> push [value]
call m1 ; pop [this] pop [value] -> push [obj]
ld this ; push [this]
call a2 ; pop [this] -> push [value2]
call m2 ; pop [obj] pop[value2] -> push [obj2]
ld this ; push [this]
call a3 ; pop [this] -> push [value3]
call m3 ; pop [obj2] pop [value3] -> push [obj3]

POP TREE:

[m3]
[m2] [a3]
[m1] [a2] [this]
[this] [a1] [this]
[this]

As an example, I have written an scanner that can find message chains in source code. This works decently well. However, I have to fix the Cecil.FlowGraph.ControlFlow model so that it can properly handle more advanced control flow -- that way things will work on real-world assemblies.

Another thing that should be easy is to determine all the variables used in the context of an if statement. This should make doing branch coverage and other analysis needing to get predicates easier to perform.

[1] http://en.wikipedia.org/wiki/Law_of_Demeter

Friday, July 27, 2007

Stats

Percentage Used Methods Covered: 0.2572512

Of the system methods used in the wild, about 75% of them are uncovered/ no unit tests.

Top10 Frequent Uncovered Methods:

System.Type.GetTypeFromHandle(runtimetypehandle)[9666],
System.Runtime.InteropServices.UCOMIConnectionPoint.Unadvise (int)[5629],
System.Runtime.InteropServices.UCOMIConnectionPoint.Advise (object,int&)[5509],
System.String.get_Length ()[4866],System.Object.GetType ()[2976],
System.Runtime.InteropServices.GCHandle.Alloc (object,gchandletype)[2744],
System.Runtime.InteropServices.GCHandle.AddrOfPinnedObject ()[2708],
System.String.get_Chars (int)[1751],System.Type.get_FullName ()[1706],
System.Data.DataRow.get_Item (string)[1590]

Top10 Types Used:

System.Windows.Forms.Control: 39031,
System.String: 37039,
System.Collections.ArrayList: 33015,
System.Threading.Monitor: 29379,
System.Type: 15596,
System.Collections.IEnumerator: 12835,
System.Runtime.InteropServices.UCOMIConnectionPoint: 11138,
System.Collections.Hashtable: 10776,
System.Delegate: 10191,
System.Runtime.InteropServices.GCHandle: 10056

Correctness+Completeness

My last work focused on feasibility and demonstrating the concept. This time I focused on correctness and completeness. For example, the analysis didn't distinguish overloads. Plus I suspected that only some of the coverage data was being joined with the usage data. I also worked on some minor enhancements like getting the class library method length.

Turns out the monocov tool outputted the method name without any parameters if it didn't have coverage data. I was only matching against ones without coverage. If it did have coverage data, it would generate a name like:
MethodName (string,byte[],System.Data.DataTable)

I have my tool match this same format so that it could work with the overloads. Unfortunately, this results in some custom code because the output shortens certain types like:
String.String-> string
String.Int32[] -> int[]
System.Object -> object
so I had to do the same.

Currently there is still a problem if the type is a reference (System.String&)

I also fixed monocov so that it would be consistent in how it would output method names without coverage data. Otherwise, it would just repeat the same method name for output.
I also fixed a bug in my simplified exporter that would use the same class name if more than one class was in the source file. While I was in monocov, I also added the feature to be able to export the length of the method.

Now that I can generate much more sound results, I will be calculating some statistics and be posting them shortly.

Friday, July 13, 2007

Surprised?

Unit Testing Effective?

Apparently there is an opposite correlation between methods used in the field, and unit testing coverage. That is, the methods that the unit test cover, were not used in the field. The methods that were used, had no coverage.

A larger sampling and verification of the results is necessary, however it is an interesting result so far.

Data Collection Methods

Over 500 projects were downloaded from the code.google.com project. The projects were selected based on the label: Mono or CSharp. In addition, 2 projects from a company were included. The projects were manually built, or a binary distribution was acquired. Some projects had to be excluded due to immaturity(not building), misclassification, and lacking the appropriate resources to build the project.

Results

For this run, only 35 of the projects were included in these results.









The information shown includes the type.method and corresponding:
  • Frequency across all applications.
  • Number of applications that the type.method appears in.
  • Coverage for that type.method. (1 = 100%, 0.50 = 50%, 0 = 0%, -1 = no test data)
Note: Coverage data was not collected for Managed.Windows.Form at this time.

One thing I think would help would be to display the (lines of code, complexity of method) in order to estimate how hard the method is to test or if it is worth testing.

Viewing the Results Yourself

I uploaded the results here:
http://groups.google.com/group/mono-soc-2007/web/FieldStatResults.xml

You can get the mono student projects from here
svn checkout http://mono-soc-2007.googlecode.com/svn/trunk/ mono-soc-2007

Compile from source or run the executable (Relies on WindowForms) under the directory:
christopher/FieldStat/FieldStat/bin/Release/FieldStat.exe

You can easily load the results under the Results tab using the(Import Results) button. Load the xml file on the web, or in christopher/FieldStat/Data/Results/FieldStatResults.xml

The default sort is on AppFrequency, you can resort on the other fields by clicking the columns.

Lessons and Design Issues

My original focus was initially just on accurately getting usage results from one executable and comparing that to the coverage data. However, I soon realized whenI was working with real data that the real challenge was with dealing with many applications and how the information varies across them. There are some interesting emergent problems. For instance, an application uses the log4net library. Is that treated as part of the application? What if several applications are linking that library.

The collection of the applications for sampling field data should be a project in itself. There should be easy automated way to scan code.google.com, sourceforge, etc and gather executables as samples.

Tuesday, July 3, 2007

Prototype Complete

I was recently squirreled away in the Rocky Mountains of Canada, so I couldn't make updates over the wire. However, I have successful completed the prototype mono coverage tool.

So far, I have only ran the program on a small set of data and coverage data, so I cannot make any bold statements about any glaring mismatches in unit testing. However, I do have some observations.
  • Statements that are uncovered, tend to be paths for exceptional behavior.
  • I think more levels of coverage should be supported: Branch, conditionals, etc.
  • The monocov tool could use a good round of refactoring.
Some stumbling blocks included
  • extra hoops in building mono as a non-root user (prefixes, no ldconfig, etc)
  • The monocov tool's export was intended for pretty presentation, rather than exporting to other tools.
Outstanding tasks include:
  • Frequency information isn't accounting for overloads.
  • Need to collect all coverage information and run the tool over several field binaries.
Phase 2 developments steps include implementing CodeRank to weight the frequency, and introducing an alternative user interface for viewing the data.

Friday, June 22, 2007

Mono Test Coverage

I'm in the process of collecting profile data from the various unit tests on Mono code.
That way I can pipe the profile data through monocov, and then spit out the coverage data and associate it with frequency information collected from the field.

The current coverage tool, Monocov, isn't written/distributed in a very Windows friendly manager. In addition, I have a feeling that Mono could use a good streamline process for their unit tests. It appears they used to run a webserver that was updated with test suite status about 2 years ago, but since have stopped.

Monday, June 11, 2007

Prototype Started...









Since I did all I could with my ASE submission, I decided to take my previous hacks/scripts, and make an initial stab at a prototype.

The protoype loads an assembly(s), and performs a scan to count references to system type method calls. This scan is from a professionally developed C# application. In this example, String.Concat is statically referenced 1148 times. This started me thinking what other information could you use with a method usage profile? Classification, bad call smells?

My next step will be to associate this information with the unit test coverage tool. Then, the prototype would have completed the full process. However, there are several problems with the basic frequency metrics and straightforward tool. (1) The calling context is not taken into account (which is related to dynamic usage is not static usage). (2) The nature of the coverage data and Mono code structure is not accounted for. (3) The presentation of information can use a more advanced task-oriented interface, report, and/or visualization options.

Sunday, June 10, 2007

Status #2

I have been spending this week working on a conference paper deadline. I had did some work up front before Google Summer of Code started in anticipation of this paper deadline.

This coming week, I want to have a prototype that runs through the entire process of the tool. The prototype will start with a simple user interface that allows selection and parameterization of the deployed executable. Then it will run an method/type usage report and present the results.

Saturday, June 2, 2007

Status

I have used a more robust way of identifying to which assembly a call belongs. Rather than just the namespace, the signed key for the assembly can be accessed through the Context property.

I imagine on the front end that one way a user would want to make a query on the types usage is something like: System.Text.* or System.*.

I have talked with Alex Orso who has pointed out some related work in residual coverage testing.

Furthermore, I have began setting up my testbed. I've been trying to gather various Mono applications in the wild. It is not as straight-forward as it sounds considering how diverse each project configuration is. So any donations for Mono executables would be accepted :)

Monday, May 28, 2007

Thoughts after ICSE

I've just returned from ICSE May 18th-May 26th. I had the chance to attend several workshops and talks and meet with several people.

One interesting thing discussed during the Mining Software Repository workshop was several ways of correlating defect density with software modules. One simple metric with high correlation was importing a package or namespace. In Eclipse, a class importing the 'compiler' package has a 71% chance of containing a bug. This is likely due to some concepts being more error-prone and show up when importing the package.

One discussion I had with Jon Cook was the difference between run-time usage frequency versus static frequency. We concluded that CodeRank or (some other static analysis) would potentially try to approximate what the distribution of run-time frequency would be against the static call sites. Empirical evaluation would be needed to confirm how well something like CodeRank approximates run-time distributions of usage. If this turns out to not work well, then a backup plan would be to use instruction coverage or runtime profiles from programs such as gcov.

I will be talking with Alex Orso this week or next for his opinion.

I also had some of my own thoughts about how to visualize the eventual output.
This is an interesting visualization problem because a framework developer may want to see the distribution of types/methods across several deployed applications.

Friday, May 18, 2007

ICSE and Plan

I will be leaving today to attend the International Conference on Software Engineering. My summer of code project will be one of the things on the agenda that I hope to get some feedback on.

That said, I want to post within the next few days a more thorough development plan that includes more concrete milestones and deliverables.

Monday, May 14, 2007

System Calls

Found a simple way to isolate system calls versus non-framework calls.

In MSIL, a call instruction includes the fully qualified name. Therefore, using Cecil you can do the following:

if (i.OpCode == Mono.Cecil.Cil.OpCodes.Call)
{
MethodReference rf = (MethodReference)i.Operand;
if (rf.DeclaringType.Namespace.StartsWith("System"))
{
systemCount++;
}
else
{
nonSystemCount++;
}
}


Currently considering what architecture is best for storing type/method call frequency which would be more complicated than simple system-calls frequency.

Sunday, May 13, 2007

Mono.Cecil

Downloaded and played around with Mono.Cecil. The framework seems very similar with instrumentor code I've written using Rails.NET. Wondering if I should try out a mini-project to get my hands more dirty with Cecil.

Wednesday, May 9, 2007

Day 1

Establishing blog for the purpose of tracking progress for my google summer of code project.

This is the text of my abstract:

Code coverage, typically referring to statement coverage, is a technique for estimating the adequacy of test cases. Achieving 100% code coverage is very difficult if not impossible. Prioritizing the development of new test cases to cover untested code requires an understanding of which criteria is most important. One criterion is to test code that is more likely to generate faults. Another criterion is to test code that is more frequently used from which it follows that code is more important and needs to be reliable.

An approach for favoring the important but uncovered code involves an algorithm that is capable of ranking the importance of code. CodeRank is a technique that is similar in spirit with Google’s PageRank –- important methods link to other important methods. The CodeRank creates an ordered ranking of all the methods where each rating assigned to a method gives its relative percent importance. This rating can be scaled by other factors including call frequency.

The delivered outcome will be a modified version of MonoCov that presents the code coverage criteria, but includes the option to rank the output by CodeRank. Future work would allow visualizing prioritized code coverage in a treemap representation.