
INTRODUCTION
Raincode S.P.R.L. is a leading company in compiler design and more generally, language processing infrastructures. We are headquartered in Brussels, Belgium, and serve customers in all major regions of the globe. We have a sales and support office serving the North American market, and a development facility in Bangalore, India.
Our organization is divided into two commercial activities:
INTERNSHIP
Brussels
Our main engineering centre is located in downtown Brussels, Belgium, close to the historical city centre, within walking distance from Brussels main train station (Gare du Midi/Zuidstation) for convenient connections to cities such as Paris, Amsterdam, Frankfurt and London.
Intellectually Rich Environment
There is more to internships than just the actual work. Significant value comes from being immersed in an exciting and challenging environment. To ensure maximum synergy between our team and each intern, full-time presence in Brussels for the duration of the internship is strongly advised, but more flexible schemes can be discussed on a case by case basis. (Not 99, or 99.99, but 100%!)
Mentoring
Interns work under the direct supervision of a mentor, who is responsible for both the scientific and the logistics aspects of the internship. Interns are integrated into the team as extensively as possible, mingling with the entire staff (breakfast, lunch, various social occasions). For the duration of their internships, they are an integral part of our staff.
Consequently, both the freedom and obligations that come with regular staff positions apply to interns:1. Flexible working hours, including remote work,when and as agreed upon with their mentor
2. Casual dress code
3. Professionalism, courtesy, trust and solidarity

Implementation and theory
Internships at Raincode Labs are never about theory alone, nor are they only about the finished product. They always combine a strong theoretical component with an equally strong focus on a working implementation.
Part of the value of a Raincode Labs internship lies in our focus on a production-level implementation, high quality standards and systematic test infrastructures. This stands in stark contrast with purely academic implementation efforts, which often aim only at demonstrating some level of feasibility, without caring for the quality or the practicality of the resulting implementation.Production and research: a delicate balance
Most Raincode Labs internship topics have a strong relationship with industry, and focus on solving a real-world problem. However, our attention to industrial applicability does not mean that interns are considered cheap labour for customer paying projects.
More specifically, our guarantee to our customers is that everyone working on a paid project is a fully qualified professional, while our guarantee to interns is that they will not be exploited for commercial purposes. The topics always show some level of industrial interest and applicability, but we maintain a Chinese wall between academic investigations and industrial projects.
Selected publications
Parsing in a Hostile World, Darius Blasband WCRE 2001: 291-300
Hard Facts vs. Soft Facts, Darius Blasband, WCRE 2008: 301-304.
Software Language Identification with Natural Language Classifiers, Juriaan Kennedy van Dam, Vadim Zaytsev, SANER 2016: 624-628.
Raincode assembler compiler (tool demo), Volodymyr Blagodarov, Yves Jaradin, Vadim Zaytsev, SLE 2016: 221-225.
The Rise and Fall of Software Recipes, Darius Blasband, Reality Bites Publishing 2016, ISBN 978-9-490-78342-6, pp. 1-368.
Language Design with Intent, Vadim Zaytsev, MoDELS 2017: 45-52
Towards a Taxonomy of Grammar Smells, Mats Stijlaart, Vadim Zaytsev, SLE 2017: 43-54.
Parser Generation by Example for Legacy Pattern Languages, Vadim Zaytsev, GPCE 2017: 212-218
Live Robot Programming: The language, its implementation, and robot API independence. Miguel Campusano, Johan Fabry, Elsevier SCP 2017, v.133: 1-19
Open Challenges in Incremental Coverage of Legacy Software Languages, Vadim Zaytsev, PX/17.2: 1-6
Master thesis topics and internships
This section lists topics for internships that can extend to a masters thesis, if so required by the students’ university. They are merely starting points: more often than not, the actual work that is being performed ends up diverging significantly from the original plan.
Reasoning over generated code
At Raincode our tools are written in a variety of languages, using the programming language that is most adequate for the task at hand. The setup of interest for this internship is our in-house language YAFL used for the implementation of the compiler and C# for a language runtime.
The code generated by the compiler will at some points include calls to the runtime. Hence the YAFL code generates code that performs calls to these functions, for example as follows:
FuncCall.CREATE (“[DLLName]ClassName::MethodName”);
FuncCall.SetObject (Target);
FuncCall.AppendArgument (DotNetFactory.MakeIntExpression(idx));
…
The above specifies the generation of code that calls a C# method ‘MethodName’ on the result of a target expression ‘Target’, which is of the class ‘ClassName’ that is found the .dll ‘DLLName’. We give the method call one argument, the constant integer expression contained in the variable ‘idx’.
The complexity we wish to address in this internship is reasoning over the generated code, e.g. to ensure that at least the method names and their arity is correct. This is because we are faced with a number of important issues:
- If there is a type error in the function call, this is revealed at a very late stage of the build process, or even worse: when the compiler runs on a client machine.
- Given this late stage in the process, it is extremely complex to trace back from the cause of the error to the code that produced the code that produced it (note that the previous is not a typo), since there is no clear trace back to the origin.
- All the methods of the runtime that are called by the compiler form an API, but it is not clear how they are used, if at all. This makes it hard to maintain and evolve.
The work for this internship consists in addressing these issues by firstly making these calls more explicit, as follows:
- The runtime methods that comprise the external API are annotated with an attribute (to be defined)
- A small C# tool then produces the source code for a Factory in YAFL, which takes care of generating the code that calls the runtime.
With the above in place, a number of the following extensions will be implemented, depending on the choice of the intern:
- Write a refactoring tool for YAFL code that modifies the code as in the example to calls of the Factory.
- Alternatively, rewrite the code by hand.
- Examine cases where the method name is constructed, instead of being one hardcoded string, and establish if/how they can be changed to use the Factory
- Create specific Factory types with meaningful methods (e.g. ‘AppendArg_Index’ in the example above)
- To do code coverage analysis and ensure that we have tests for all the methods called from the current version of the compiler
- To load the actual method name to invoke from an external source (e.g. an xml file) to let old compilers compile for a more recent runtime while moving old methods out of the way
Transparent remote development and compilation
At Raincode we are, at times, faced with a client setup that is a mix between different computer environments, e.g. some servers with a Linux operating system, a mainframe, and a set of development workstations running Windows. A core axiom of our way of working is the concept of the single source: the same source file should compile without changes on the different environments. This means that development and maintenance of a (part of) an application always happens on the same sources, regardless of the target environment.
An important challenge in these setups is enabling straightforward treatment of remote and local sources with regard to development and compilation. For a particular client we have already developed a one-off solution that helps in their context. For this client and other clients, simply obtaining all the files that are kept in version control (VC) or source code management (SCM) system (e.g. git clone) is not an option. This is because these repositories are huge so getting a complete copy locally on all places they are used is a significant and unnecessary waste of resources.
The topic of this internship is to develop a more generic solution, based on the idea of redirect files. The core idea of this solution is to, locally, have a tree of redirect files that matches the tree of source code files, which is kept remotely in a VC or SCM. A redirect file is a proxy: nothing more than a text file that specifies the location of the actual source code file. The compiler and the development environment will work with these redirect files as is they are the actual source code files themselves, as transparently as possible. The tree of redirect files is also kept in VC/SCM, obviously.
More in detail, they work with redirect files in the following manner:
- The compiler will fetch the appropriate sources from the VC/SCM as needed and compile locally.
- The development environment can create files that are only local or also kept in the VC/SCM, the latter also amending the tree of redirect files.
- The development environment will fetch the appropriate sources from the VC/SCM as needed and edit locally.
- When saving, the user can choose between saving a local copy or uploading to the VC/SCM, the latter with conflict resolution if needed.
- The compiler, when faced with a file that is local but also remote, needs to establish whether there is a conflict and also perform conflict resolution.
- Conflict resolution may be non-trivial in this setting, since it depends on what tool detects the conflict, as well as in which context it runs. For the development environment, conflict resolution may start with some default strategies but may also include interacting with the user. For the compiler however, conflict resolution needs to happen without user interaction when running standalone. But then again, when the compiler runs as part of a build within the development environment, all conflict resolution facilities of the development environment need to be used.
- Typical use cases for this setup are the following:
- Development on workstations and a remote build farm building executables for the workstation
- Development on workstations and testing and production on a mainframe
- Applications with parts that are shared between different applications, on a mainframe VC/SCM and parts specific to that application that are contained in a separate VC/SCM.
We expect the intern to design and develop this solution to work with one of our compilers, one specific VC/SCM (to be decided) and Visual Studio or Visual Studio Code (to be decided).
Static SQL Performance Predictor
The Raincode compilers include a capability to convert SQL statements found in COBOL and PL/I programs, from DB2’s SQL dialect to the (vastly different) SQLServer dialect. While this transformation is designed to guarantee functional equivalence, the resulting performance of the transformed SQL statement may be significantly different. This is mainly due to the difference in optimization strategies implemented by the various database engines, and transformed statements preventing a database optimizer from using indexes adequately. An example of typical performance degradation is when the target database fails to use an index while the original database did.
The purpose of this work is to design and implement a tool that will statically predict which converted SQL statements will perform significantly slower than the original ones. At Raincode we have access to, and need to process, large portfolios of code with embedded SQL statements that are part of migration projects. The work would use such portfolios as input that guides the development direction of the tool.
This tool will:
- Take existing DB2 SQL statements, ask DB2 for the query plan and
- analyze these results to calculate a performance score.
- Take the translated SQLServer statements, ask SqlServer for the query plan, and
- analyze these results to calculate a performance score.
- Report the difference in performance score and
- report which (parts of) the SQL statement may suffer a significant performance penalty.
- Optionally, the tool will propose a means to improve performance.
This final ambition of this work is to pave the way to a tool that would run on large scale portfolios in migration projects, so that possible performance hotspots could be detected and remedied without having to go through lengthy testing, profiling and debugging sessions.
Visual Studio Plugin for C# Test Smells detection
Test smells are a variety of bad smells that are specific to test code [1]. As reported by van Deursen et. al.: “Test code has a distinct set of smells, dealing with the ways in which test cases are organized, how they are implemented, and how they interact with each other.” Some examples are (taken verbatim from [1]):
- Assertion Roulette. “Guess what’s wrong?” This smell comes from having a number of assertions in a test method that have no explanation. If one of the assertions fails, you do not know which one it is.
- Indirect Testing. A test class is supposed to test its counterpart in the production code. It starts to smell when a test class contains methods that actually perform tests on other objects (for example because there are references to them in the class-to-be-tested).
- Test Code Duplication. Test code may contain undesirable duplication. In particular the parts that set up test fixtures are susceptible to this problem.
Recent research [2] has shown that smelly tests are actually correlated with code of lower quality in two ways:
- Smelly tests are more change prone and more defect prone than tests that are not. In addition, test methods with more smells are more change prone.
- The code that is tested by smelly tests is also more change prone and more defect prone. This effect on production code is a strong argument against the presence of bad test smells.
It is therefore doubly interesting to avoid writing smelly tests. Not only will the tests themselves will have less defects and change less, but more importantly, production code will be less buggy and change less.
The goal of this work is to have both the current and future tests of our in-house tools to smell like roses. A first step for this is to build a bad smell detection tool for C# code that can be run on our codebase. Yet, experiences with static analysis tools [3] have shown that this typically does not suffice. What is needed in addition is a good means to motivate developers to write better code. One way that works is through continuous feedback in their development environment. So a second step of this work is to write a plugin to the Visual Studio IDE that reveals the smelliness of tests when the developer is writing them. A third step would then also be to implement refactorings that allow the plugin to propose a way to eliminate (some of) the bad smell(s), e.g. starting with the refactorings described in [1].
[1] Van Deursen A, Moonen L, Van Den Bergh A, Kok G. “Refactoring test code”. In Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001) 2001 (pp. 92-95). [2] Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A. “On the relation of test smells to software code quality”. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME). IEEE 2018. [3] Harman M, O’Hearn P. “From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis”. In Proceedings of the 18th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE2018Standards-compliant debugging in YAFL
At Raincode our tools are written in a variety of languages, using the programming language that is most adequate for the task at hand. The setup of interest for this internship is our in-house language YAFL used for the implementation of our compilers.
The YAFL compiler itself has been used in production for over 30 years now and has been constantly evolving during that time. The first version of the YAFL debugger was designed and developed after just a few years, but even though it provides all the basic capabilities ones expects from such a tool, it has shown to come with a steep learning curve and to be cumbersome for daily use.
This internship is about changing the existing debugging infrastructure to a more user-friendly experience. The goal is to make it compliant to the industry standard gdb machine interface: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_chapter/gdb_22.html and ensure that an existing debugging tools (as integrated in Visual Studio Code, for instance) can be used.
We have experience with such efforts, having enabled the debugging of a similar language in Visual Studio Code. This was done through the implementation of a adaptor layer from the gdb machine interface to the debugging interface of the languge. This implementation can be used as a reference for which commands to implement (as there are many and not all are used) as well as for general inspiration and examples
CONTACT
RAINCODE SPRL. Raincode S.P.R.L.
Rue de la Caserne 45
1000 Brussels
Belgium
+32(0)2522.06.63
academia@www.raincodelabs.com
IN THE NEWS TODAY
Demystifying DSLs Part I: Time is of the Essence
Demystifying DSLs Part 1: Time is of the Essence When it comes to implementing industrial-grade Domain Specific Languages (DSLs), rushing to deliver something as early [...]
How Domain Specific Languages Change our Lives
Soon the 8th Summer School on Domain Specific Modeling Theory & Practice (DSM-TP 2017) will take place at Université de Montréal (Montreal, Canada). This 5-day [...]
The Brain Challenge: Arithmetic Puzzle
Brain Teaser Tired of 4th grade arithmetic’s puzzles disguised as advances for mankind? Discover something worthy of your time where failure is a possibility and [...]