Reasoning over generated code
At Raincode our tools are written in a variety of languages, using the programming language that is most adequate for the task at hand. The setup of interest for this internship is our in-house language YAFL used for the implementation of the compiler and C# for a language runtime.
The code generated by the compiler will at some points include calls to the runtime. Hence the YAFL code generates code that performs calls to these functions, for example as follows:
The above specifies the generation of code that calls a C# method ‘MethodName’ on the result of a target expression ‘Target’, which is of the class ‘ClassName’ that is found the .dll ‘DLLName’. We give the method call one argument, the constant integer expression contained in the variable ‘idx’.
The complexity we wish to address in this internship is reasoning over the generated code, e.g. to ensure that at least the method names and their arity is correct. This is because we are faced with a number of important issues:
- If there is a type error in the function call, this is revealed at a very late stage of the build process, or even worse: when the compiler runs on a client machine.
- Given this late stage in the process, it is extremely complex to trace back from the cause of the error to the code that produced the code that produced it (note that the previous is not a typo), since there is no clear trace back to the origin.
- All the methods of the runtime that are called by the compiler form an API, but it is not clear how they are used, if at all. This makes it hard to maintain and evolve.
The work for this internship consists in addressing these issues by firstly making these calls more explicit, as follows:
- The runtime methods that comprise the external API are annotated with an attribute (to be defined)
- A small C# tool then produces the source code for a Factory in YAFL, which takes care of generating the code that calls the runtime.
With the above in place, a number of the following extensions will be implemented, depending on the choice of the intern:
- Write a refactoring tool for YAFL code that modifies the code as in the example to calls of the Factory.
- Alternatively, rewrite the code by hand.
- Examine cases where the method name is constructed, instead of being one hardcoded string, and establish if/how they can be changed to use the Factory
- Create specific Factory types with meaningful methods (e.g. ‘AppendArg_Index’ in the example above)
- To do code coverage analysis and ensure that we have tests for all the methods called from the current version of the compiler
- To load the actual method name to invoke from an external source (e.g. an xml file) to let old compilers compile for a more recent runtime while moving old methods out of the way
Transparent remote development and compilation
At Raincode we are, at times, faced with a client setup that is a mix between different computer environments, e.g. some servers with a Linux operating system, a mainframe, and a set of development workstations running Windows. A core axiom of our way of working is the concept of the single source: the same source file should compile without changes on the different environments. This means that development and maintenance of a (part of) an application always happens on the same sources, regardless of the target environment.
An important challenge in these setups is enabling straightforward treatment of remote and local sources with regard to development and compilation. For a particular client we have already developed a one-off solution that helps in their context. For this client and other clients, simply obtaining all the files that are kept in version control (VC) or source code management (SCM) system (e.g. git clone) is not an option. This is because these repositories are huge so getting a complete copy locally on all places they are used is a significant and unnecessary waste of resources.
The topic of this internship is to develop a more generic solution, based on the idea of redirect files. The core idea of this solution is to, locally, have a tree of redirect files that matches the tree of source code files, which is kept remotely in a VC or SCM. A redirect file is a proxy: nothing more than a text file that specifies the location of the actual source code file. The compiler and the development environment will work with these redirect files as is they are the actual source code files themselves, as transparently as possible. The tree of redirect files is also kept in VC/SCM, obviously.
More in detail, they work with redirect files in the following manner:
- The compiler will fetch the appropriate sources from the VC/SCM as needed and compile locally.
- The development environment can create files that are only local or also kept in the VC/SCM, the latter also amending the tree of redirect files.
- The development environment will fetch the appropriate sources from the VC/SCM as needed and edit locally.
- When saving, the user can choose between saving a local copy or uploading to the VC/SCM, the latter with conflict resolution if needed.
- The compiler, when faced with a file that is local but also remote, needs to establish whether there is a conflict and also perform conflict resolution.
- Conflict resolution may be non-trivial in this setting, since it depends on what tool detects the conflict, as well as in which context it runs. For the development environment, conflict resolution may start with some default strategies but may also include interacting with the user. For the compiler however, conflict resolution needs to happen without user interaction when running standalone. But then again, when the compiler runs as part of a build within the development environment, all conflict resolution facilities of the development environment need to be used.
- Typical use cases for this setup are the following:
- Development on workstations and a remote build farm building executables for the workstation
- Development on workstations and testing and production on a mainframe
- Applications with parts that are shared between different applications, on a mainframe VC/SCM and parts specific to that application that are contained in a separate VC/SCM.
We expect the intern to design and develop this solution to work with one of our compilers, one specific VC/SCM (to be decided) and Visual Studio or Visual Studio Code (to be decided).
Static SQL Performance Predictor
The Raincode compilers include a capability to convert SQL statements found in COBOL and PL/I programs, from DB2’s SQL dialect to the (vastly different) SQLServer dialect. While this transformation is designed to guarantee functional equivalence, the resulting performance of the transformed SQL statement may be significantly different. This is mainly due to the difference in optimization strategies implemented by the various database engines, and transformed statements preventing a database optimizer from using indexes adequately. An example of typical performance degradation is when the target database fails to use an index while the original database did.
The purpose of this work is to design and implement a tool that will statically predict which converted SQL statements will perform significantly slower than the original ones. At Raincode we have access to, and need to process, large portfolios of code with embedded SQL statements that are part of migration projects. The work would use such portfolios as input that guides the development direction of the tool.
This tool will:
- Take existing DB2 SQL statements, ask DB2 for the query plan and
- analyze these results to calculate a performance score.
- Take the translated SQLServer statements, ask SqlServer for the query plan, and
- analyze these results to calculate a performance score.
- Report the difference in performance score and
- report which (parts of) the SQL statement may suffer a significant performance penalty.
- Optionally, the tool will propose a means to improve performance.
This final ambition of this work is to pave the way to a tool that would run on large scale portfolios in migration projects, so that possible performance hotspots could be detected and remedied without having to go through lengthy testing, profiling and debugging sessions.
Visual Studio Plugin for C# Test Smells detection
Test smells are a variety of bad smells that are specific to test code . As reported by van Deursen et. al.: “Test code has a distinct set of smells, dealing with the ways in which test cases are organized, how they are implemented, and how they interact with each other.” Some examples are (taken verbatim from ):
- Assertion Roulette. “Guess what’s wrong?” This smell comes from having a number of assertions in a test method that have no explanation. If one of the assertions fails, you do not know which one it is.
- Indirect Testing. A test class is supposed to test its counterpart in the production code. It starts to smell when a test class contains methods that actually perform tests on other objects (for example because there are references to them in the class-to-be-tested).
- Test Code Duplication. Test code may contain undesirable duplication. In particular the parts that set up test fixtures are susceptible to this problem.
Recent research  has shown that smelly tests are actually correlated with code of lower quality in two ways:
- Smelly tests are more change prone and more defect prone than tests that are not. In addition, test methods with more smells are more change prone.
- The code that is tested by smelly tests is also more change prone and more defect prone. This effect on production code is a strong argument against the presence of bad test smells.
It is therefore doubly interesting to avoid writing smelly tests. Not only will the tests themselves will have less defects and change less, but more importantly, production code will be less buggy and change less.
The goal of this work is to have both the current and future tests of our in-house tools to smell like roses. A first step for this is to build a bad smell detection tool for C# code that can be run on our codebase. Yet, experiences with static analysis tools  have shown that this typically does not suffice. What is needed in addition is a good means to motivate developers to write better code. One way that works is through continuous feedback in their development environment. So a second step of this work is to write a plugin to the Visual Studio IDE that reveals the smelliness of tests when the developer is writing them. A third step would then also be to implement refactorings that allow the plugin to propose a way to eliminate (some of) the bad smell(s), e.g. starting with the refactorings described in .
 Van Deursen A, Moonen L, Van Den Bergh A, Kok G. “Refactoring test code”. In Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001) 2001 (pp. 92-95).
 Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A. “On the relation of test smells to software code quality”. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME). IEEE 2018.
 Harman M, O’Hearn P. “From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis”. In Proceedings of the 18th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE2018