Sunday, August 26, 2007

Summary

Project Description

FieldStat is a tool for scanning a collection of application binaries in order to understand how a particular set of classes and methods are being used in practice.

One main purpose of the tool is to support prioritizing unit test coverage based on API usage data observed in practice. Developers can view how often and how many applications use certain types and method calls in relation to unit test coverage and difficulty of testing.
This shares similar goals with clean room software engineering in that a reliability model is based on statistical likelihood of occurring. That is, place more testing effort in more likely occurring scenarios.

The tool relies on the Mono.Cecil assembly to process CIL byte-code instructions.

Delivered

Planned

Get up to speed with Cecil.
Extract call graph.
Identify and isolate mono framework calls.
Implement CodeRank algorithm.
Integrate with MonoCov.
Test and Refine application.
Documentation: User manual, design docs, touch-up comments

Extra

Command Line and Graphics Interface
Export Data
Include other statistics such as application usage count, method complexity.
Plugin Architecture
Plugin for finding Design Fragments (Set of reoccurring system calls)
(External) Improve Mono.Cecil tool to have better support for reconstructing statements and message chain bad code detector.

Components

Application Repository

Understanding how API calls are used in practice requires sampling actual software. Unfortunately, access to business software is limited; however, many open source applications offer a good starting point.

Over 500 projects were downloaded from the code.google.com project. The projects were selected based on the label: Mono or CSharp. In addition, 2 projects from a company were included. The projects were manually built, or a binary distribution was acquired. Some projects had to be excluded due to immaturity(not building), misclassification, and lacking the appropriate resources to build the project.

Coverage Data

The monocov tool gathers statement coverage information from the run-time execution of a Mono application. The statement coverage can be gathered in the following manner.

In the mono distribution mcs/class/corlib directory:
> make run-test RUNTIME_FLAGS="--profile=monocov:outfile=corlib.cov,+[mscorlib]"
> monocov --export-xml=/tmp/corlib-cov corlib.cov

However, the generated xml file was intended for presentation, not importing. Instead,
a new export option is introduced.

> monocov --export-fieldstat=/tmp/corlib-cov corlib.cov

Scanners and Analysis

FieldStat uses a visitor/collector pattern for gathering statistics. A visitor class walks the assemblies, classes, and methods. To gather statistics, a collector class is registered with the visitor and is notified when a particular a node of interest is visited. For instance, a collector can be notified whenever a system call is encountered.

Some default collectors included in FieldStat.

AppStat - Simply counts the number of system calls used per application.
CodeRank - Build's an application's call tree and calculates the associated code rank of each application method.
TypeCount - Counts the number of times a system type and system method is statically called in an application.

Running the Tool

The simpliest way run FieldStat is to specify the directory containing the coverage information (the -export-fieldstat output from monocov).

> FieldStat --coverage-path="../../../Data/Coverage Data/coverage_output" FieldStat.exe

This would output the Results.xml files in the output/ directory.

The file can be read with a XML parser or with the DataTable.ReadFile( file ) call.

A record is as follows:

<results>
<type>System.IO.Path</type>
<method>GetFileNameWithoutExtension (string)</method>
<length>1</length>
<frequency>2</frequency>
<rankedfrequency>0.30</rankedfrequency>
<appfrequency>1</appfrequency>
<coverage>1</coverage>
</results>

Screenshots

Plugins and Future Directions

Plugin System

Plugins can be created by dropping in a file named *Plugin.dll into the Plugins/ directory.

Design Fragments

An example plugin is included in the source code as the DesignFragmentPlugin project.

A design fragment is a pattern or common usage of a framework. This plugin attempts to detect candidates for design fragments by looking at common sequences of calls. This then in turn could be used to improve the framework, or serve as documentation or snippets in how to use the framework.

For example, the following series of calls was found to occur 10 times in the application repository. It looks like the user is trying to format a date in a particular way:

System.DateTime.get_Day ()
System.Int32.ToString ()
System.DateTime.get_Month ()
System.DateTime.get_Year ()
System.Int32.ToString ()
System.String.get_Length ()
System.String.Concat (string,string)
System.String.get_Length ()
System.String.Substring (int,int)
System.String.Concat (System.String[])

The plugin is written as follows:

using FieldStat.DataCollection;
using FieldStat.CodeModel;

public class DesignFragmentPlugin : AbstractPlugin
{
public override void ComputeResults(Results results, ICollection files, Hashtable htBin)
{
Visit scan = new Visit();
scan.Collectors.Register("DesignFragment", new DesignFragmentCollector());

scan.DoScan( files, htBin);
MyCollector seqs = (MyCollector)scan.Collectors["DesignFragment"];
// Process Results ...
}
}

public class DesignFragmentCollector : AbstractCollector
{
public override void OnMethodBody(MethodBody body)
{
ArrayList seq = GetSystemCallSequences(body);
if (seq.Count > 3)
{
sequences.Add(EncodeSequenceCalls( seq ));
}
}
....
}

SystemSignature

This plugin extracts the system calls made in an application. Then the resulting system call signature is compared against the other application's signatures. The results can be clustered to find common type of applications. It may also be used to judge the implementation of a well-known type of application. For example, many irc clients have been written. How divergent is yours from the other known clients? Maybe you should have taken advantage of a different architecture that you were not aware of. (Plugin under development)

Monday, August 20, 2007

Done. (Sorta)

Yesterday I finished implementing all the features that I had planned (and more).

Some of those features:

* New CodeRank metric where the use of a system call is weight by that function's code rank.

* Provide a non-graphics interface to using tool.

* Provide the ability to write custom plugins for collecting data and creating reports from unit test data and example application repository.

* Enhanced the grid view to visualize coverage information.

I also have a couple more ideas for plugins and how to use the data, but first I want to check in my code and make sure there is sufficient documentation.

Friday, August 3, 2007

Introducing PopTrees (Detecting Message Chains)

I introduce a PopTree as a simple abstraction over CIL and demonstrate how they can be used to find Message Chains.

Detecting Message Chains.

Message chains violate the Law of Dementor[1] by accessing a long series of other object's Methods and Properties. Detecting Message Chains violations in CIL involves understanding the local structure of statements. A large subset of instructions can occur in the context of a passing values to a method call. This means understanding the structure of message chains entails understanding much of the CIL instructions used in passing arguments. In addition, the instructions have to be understood to tell the difference between a sequence of method calls and a chain of method calls.

The problem is that there isn't good support for representing the instructions as more complex statements. Usually this is done ad hoc for the purpose of a person's analysis. Some code in Cecil.FlowAnalysis.ActionFlow has attempted to provide some abstractions over CIL; however, it only implemented a small subset of the instructions. In addition, the design tried prematurely canonicalizing the instruction tree making it difficult to extend to the full instruction set.

Instead, I introduce the concept of a PopTree. A PopTree roughly corresponds to a statement in the original source code. A PopTree is constructed by simulating the instruction stream to push and pop various values on a state machine. An instruction 'owns' another instruction if it pops it's value off the stack. A post-order transversal should closely correspond to the original AST. The hardest part was examining the spec to see how many pops and pushes an given instruction causes.

Algorithm:


public PopTreeList Build( ICollection instructions )
{
    foreach (Instruction i in instructions)
    {
        PopTree p = new PopTree(i);
        int pops = InstructionInfo.Pops(i,_machineStack.Count);
        if (pops > _machineStack.Count)
        {
            // can happen with exception blocks whose support is flaky.
            //Console.WriteLine("Attempting to pop more than machine has: " + i.OpCode.ToString() + " " + i.Operand);
            return null;
        }
        for (int x = 0; x <>
        {
            p.Consume(Pop());
        }
        if (InstructionInfo.Pushes(i))
        {
            Push(p);
        }
        else
        {
            _popTreeList.Add(p);
        }
    }
    return _popTreeList;
}

CODE:
m1( a1() ).m2( a2() ).m3( a3() )

CIL:
ld this ; push [this]
dup ; push [this]
call a1 ; pop [this] -> push [value]
call m1 ; pop [this] pop [value] -> push [obj]
ld this ; push [this]
call a2 ; pop [this] -> push [value2]
call m2 ; pop [obj] pop[value2] -> push [obj2]
ld this ; push [this]
call a3 ; pop [this] -> push [value3]
call m3 ; pop [obj2] pop [value3] -> push [obj3]

POP TREE:


               [m3]
          [m2]    [a3]
       [m1]   [a2]    [this] 
  [this]  [a1]   [this] 
             [this]

As an example, I have written an scanner that can find message chains in source code. This works decently well. However, I have to fix the Cecil.FlowGraph.ControlFlow model so that it can properly handle more advanced control flow -- that way things will work on real-world assemblies.

Another thing that should be easy is to determine all the variables used in the context of an if statement. This should make doing branch coverage and other analysis needing to get predicates easier to perform.

[1] http://en.wikipedia.org/wiki/Law_of_Demeter

Friday, July 27, 2007

Stats

Percentage Used Methods Covered: 0.2572512

Of the system methods used in the wild, about 75% of them are uncovered/ no unit tests.

Top10 Frequent Uncovered Methods:

System.Type.GetTypeFromHandle(runtimetypehandle)[9666],
System.Runtime.InteropServices.UCOMIConnectionPoint.Unadvise (int)[5629],
System.Runtime.InteropServices.UCOMIConnectionPoint.Advise (object,int&)[5509],
System.String.get_Length ()[4866],System.Object.GetType ()[2976],
System.Runtime.InteropServices.GCHandle.Alloc (object,gchandletype)[2744],
System.Runtime.InteropServices.GCHandle.AddrOfPinnedObject ()[2708],
System.String.get_Chars (int)[1751],System.Type.get_FullName ()[1706],
System.Data.DataRow.get_Item (string)[1590]

Top10 Types Used:

System.Windows.Forms.Control: 39031,
System.String: 37039,
System.Collections.ArrayList: 33015,
System.Threading.Monitor: 29379,
System.Type: 15596,
System.Collections.IEnumerator: 12835,
System.Runtime.InteropServices.UCOMIConnectionPoint: 11138,
System.Collections.Hashtable: 10776,
System.Delegate: 10191,
System.Runtime.InteropServices.GCHandle: 10056

Correctness+Completeness

My last work focused on feasibility and demonstrating the concept. This time I focused on correctness and completeness. For example, the analysis didn't distinguish overloads. Plus I suspected that only some of the coverage data was being joined with the usage data. I also worked on some minor enhancements like getting the class library method length.

Turns out the monocov tool outputted the method name without any parameters if it didn't have coverage data. I was only matching against ones without coverage. If it did have coverage data, it would generate a name like:
MethodName (string,byte[],System.Data.DataTable)

I have my tool match this same format so that it could work with the overloads. Unfortunately, this results in some custom code because the output shortens certain types like:
String.String-> string
String.Int32[] -> int[]
System.Object -> object
so I had to do the same.

Currently there is still a problem if the type is a reference (System.String&)

I also fixed monocov so that it would be consistent in how it would output method names without coverage data. Otherwise, it would just repeat the same method name for output.
I also fixed a bug in my simplified exporter that would use the same class name if more than one class was in the source file. While I was in monocov, I also added the feature to be able to export the length of the method.

Now that I can generate much more sound results, I will be calculating some statistics and be posting them shortly.

Friday, July 13, 2007

Surprised?

Unit Testing Effective?

Apparently there is an opposite correlation between methods used in the field, and unit testing coverage. That is, the methods that the unit test cover, were not used in the field. The methods that were used, had no coverage.

A larger sampling and verification of the results is necessary, however it is an interesting result so far.

Data Collection Methods

Over 500 projects were downloaded from the code.google.com project. The projects were selected based on the label: Mono or CSharp. In addition, 2 projects from a company were included. The projects were manually built, or a binary distribution was acquired. Some projects had to be excluded due to immaturity(not building), misclassification, and lacking the appropriate resources to build the project.

Results

For this run, only 35 of the projects were included in these results.

The information shown includes the type.method and corresponding:

Frequency across all applications.
Number of applications that the type.method appears in.
Coverage for that type.method. (1 = 100%, 0.50 = 50%, 0 = 0%, -1 = no test data)

Note: Coverage data was not collected for Managed.Windows.Form at this time.

One thing I think would help would be to display the (lines of code, complexity of method) in order to estimate how hard the method is to test or if it is worth testing.

Viewing the Results Yourself

I uploaded the results here:
http://groups.google.com/group/mono-soc-2007/web/FieldStatResults.xml

You can get the mono student projects from here
svn checkout http://mono-soc-2007.googlecode.com/svn/trunk/ mono-soc-2007

Compile from source or run the executable (Relies on WindowForms) under the directory:
christopher/FieldStat/FieldStat/bin/Release/FieldStat.exe

You can easily load the results under the Results tab using the(Import Results) button. Load the xml file on the web, or in christopher/FieldStat/Data/Results/FieldStatResults.xml

The default sort is on AppFrequency, you can resort on the other fields by clicking the columns.

Lessons and Design Issues

My original focus was initially just on accurately getting usage results from one executable and comparing that to the coverage data. However, I soon realized whenI was working with real data that the real challenge was with dealing with many applications and how the information varies across them. There are some interesting emergent problems. For instance, an application uses the log4net library. Is that treated as part of the application? What if several applications are linking that library.

The collection of the applications for sampling field data should be a project in itself. There should be easy automated way to scan code.google.com, sourceforge, etc and gather executables as samples.

Tuesday, July 3, 2007

Prototype Complete

I was recently squirreled away in the Rocky Mountains of Canada, so I couldn't make updates over the wire. However, I have successful completed the prototype mono coverage tool.

So far, I have only ran the program on a small set of data and coverage data, so I cannot make any bold statements about any glaring mismatches in unit testing. However, I do have some observations.

Statements that are uncovered, tend to be paths for exceptional behavior.
I think more levels of coverage should be supported: Branch, conditionals, etc.
The monocov tool could use a good round of refactoring.

Some stumbling blocks included

extra hoops in building mono as a non-root user (prefixes, no ldconfig, etc)
The monocov tool's export was intended for pretty presentation, rather than exporting to other tools.

Outstanding tasks include:

Frequency information isn't accounting for overloads.
Need to collect all coverage information and run the tool over several field binaries.

Phase 2 developments steps include implementing CodeRank to weight the frequency, and introducing an alternative user interface for viewing the data.

Summer of Code