Stress-testing compilers by automated innocuous changes
The goal of this thesis is to design and implement behaviour-preserving transformations, to be applied to test programs, resulting in modified programs that should behave as the original. Examples of such transformations include (but are not limited to)
• adding expressions to integer expressions that are guaranteed to be zero,
• similarly multiplying them by 1 (even though the compiler does not have the static capability of taking advantage of this fact)
• adding extraneous parentheses or other bracing constructs,
• adding control flow branches that are never executed, given the expected input values of the programs,
• replacing structured constructs by more elementary ones using labels and GO TO statements, etc.
This work would be developed using Raincode’s proprietary meta-programming technology, and would be applied to the extensive regression testing infrastructure for our COBOL and PL/I compilers. This will result in extra checks for correctness of the compilers, and possibly finding bugs in the compilers as has been performed previously, e.g. for GCC and LLVM by Zhendong Su et al (See ICSME17 keynote https://icsme2017.github.io/program/keynotes.html). Furthermore, it will show that this approach is feasible in languages that are syntactically more complex than those used in current work.
Code Slicing by Program Transformation
Program slicing is a well-researched technique to check for dependencies: given a variable, its forward slice contains all executable statements that access it or anything dependent from it — the original idea was for slices to be executable, but that is rarely a hard requirement in practice (often it is enough for them to be compilable).
Usually slicing is done by creating an abstract syntax tree, analysing it to produce all kinds of useful but computationally hard to obtain artefacts like dependence graphs, and then performing slice actions as trivial reachability operations on those graphs. However, it can also be done much cheaper and faster, by transforming programs and borrowing ideas from partial evaluation and supercompilation — simply put, by assuming that all variables beside interesting ones are constants.
The goal of this project is to implement such a slicer in an industrial metaprogramming language, test it on actual codebases and research the advantages and limitations.
Retargeting the Assembler Compiler
One of Raincodes’ compilers has an unprecedented modular structure and relies on principles of model-driven engineering: the instruction set of the language is modelled on a level high enough to generate several interrelated artefacts needed to parse the language and support its runtime execution.
One of the artefacts generated that way is essentially microcode: the semantics of each instruction is expressed in a sequence of almost-atomic steps which then are used in another round of code generation to produce code in C# (for the generated emulator) and in CIL (for performance-targeted inlining).
The goal of the project is to retarget this system to generate code in C so that an emulator can be generated to be compiled by GCC on Linux. Some degree of success for this project is already guaranteed by the straightforwardness of the first steps (we know it is possible to generate C, this is why this microcode language was developed in the first place), but it contains enough hidden challenges of both technical and scientific nature.
Blagodarov, Y. Jaradin, V. Zaytsev. Tool Demo: Raincode Assembler Compiler. SLE 2016, pp. 221-225. DOI: 10.1145/2997364.2997387
• The Raincode ASM370 compiler for Microsoft .NET, https://www.raincodelabs.com/products/asm370/
Static comparison of relational database optimization schemes applied to large portfolios
The Raincode compilers include a capability to convert SQL statements found in COBOL and PL/I programs, from DB2’s SQL dialect to the (vastly different) SQLServer dialect. While this transformation is designed to guarantee functional equivalence, the resulting performance of the transformed SQL statement may be significantly different. This is mainly due to the difference in optimization strategies implemented by the various database engines, and by the entropic nature of the transformations which may prevent a database optimizer from using indexes adequately.
An example of typical performance degradation is when the target database fails to use an index while the original database did.
The purpose of this thesis is to design a system that will statically predict which converted SQL statements will perform significantly slower than the original ones. To achieve such predictions, it will
• take existing SQL statements (possibly in the tens of thousands), each with its translated counterpart,
• query both databases for their respective query plan,
• and report those statements that may suffer a significant performance penalty due to the translation.
This final ambition of this thesis is to pave the way to a tool that would run on large scale portfolios in migration projects, so that possible performance hotspots could be detected and remedied without having to go through lengthy testing, profiling and debugging sessions.
Visual Studio Plugin for C# Test Smells detection
Test smells are a variety of bad smells that are specific to test code . As reported by van Deursen et. al.: “Test code has a distinct set of smells, dealing with the ways in which test cases are organized, how they are implemented, and how they interact with each other.” Some examples are (taken verbatim from ):
– Assertion Roulette. “Guess what’s wrong?” This smell comes from having a number of assertions in a test method that have no explanation. If one of the assertions fails, you do not know which one it is.
– Indirect Testing. A test class is supposed to test its counterpart in the production code. It starts to smell when a test class contains methods that actually perform tests on other objects (for example because there are references to them in the class-to-be-tested).
– Test Code Duplication. Test code may contain undesirable duplication. In particular the parts that set up test fixtures are susceptible to this problem.
Recent research  has shown that smelly tests are actually correlated with code of lower quality in two ways:
1) Smelly tests are more change prone and more defect prone than tests that are not. In addition, test methods with more smells are more change prone.
2) The code that is tested by smelly tests is also more change prone and more defect prone. This effect on production code is a strong argument against the presence of bad test smells.
It is therefore doubly interesting to avoid writing smelly tests. Not only will the tests themselves will have less defects and change less, but more importantly, production code will be less buggy and change less.
The goal of this work is to have both the current and future tests of our in-house tools to smell like roses. A first step for this is to build a bad smell detection tool for C# code that can be run on our codebase. Yet, experiences with static analysis tools  have shown that this typically does not suffice. What is needed in addition is a good means to motivate developers to write better code. One way that works is through continuous feedback in their development environment. So a second step of this work is to write a plugin to the Visual Studio IDE that reveals the smelliness of tests when the developer is writing them. A third step would then also be to implement refactorings that allow the plugin to propose a way to eliminate (some of) the bad smell(s), e.g. starting with the refactorings described in .
 Van Deursen A, Moonen L, Van Den Bergh A, Kok G. “Refactoring test code”. In Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001) 2001 (pp. 92-95).
 Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A. “On the relation of test smells to software code quality”. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME). IEEE 2018.
 Harman M, O’Hearn P. “From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis”. In Proceedings of the 18th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE2018