We Speak Your Language S01 E05
Hello, welcome to “We speak your language”, the podcast for Computer Language geeks and nerds. This episode is hosted by Darius Blasband and is brought to you by Raincode Labs.
Darius: I’m Darius and I will be your host today, here at “We speak your language”, the podcast for programming language nerds. It’s also my pleasure to welcome Manuel Serrano as our guest on today’s episode.
Manuel is a researcher at the INRIA in gorgeous Sophia Antipolis, north of Nice in the South of France, and he has been involved in programming language research and implementation for years. He’ll be able to tell you more about his work in a minute. It is all very impressive, but the truth is how I got to think about him came from something about Manuel that impresses me way more than his technical and academic achievements.
Manuel is super nice and approachable, and we’re both French speakers, so we instantly connected. That’s how conferences go. Over drinks he explained to me how he uses his own compilers for everything he writes, along the lines of eating your own dog food and but he takes it to a level I had never heard of before, going as far as programming his home music system himself using his own compilers, and his own languages.
I thought that was so cool! I both admire and envy this badass attitude, computing, self-reliance pushed to the limit!
So, Manuel, welcome.
Manuel: Thank you very much and thank you for this introduction Darius.
Darius: Tell us a little bit more about who you are and where you come from?
Manuel: I’m a researcher working in a French Research Institute, an academic Research Institute called INRIA and located in the South of France, and I’ve been studying compilers and programming languages, design and implementation for something like 30 years now.
Darius: So, if I understand correctly, you jumped into it almost from day one. I mean, you virtually did nothing else.
Manuel: Yes, exactly, I did my PhD on the compilation of functional languages and more precisely the compilation of Scheme, which is a dialect of Lisp. I discovered Lisp while I was a student and I got fascinated by this language and actually what I have tried to do during my PhD is to build a compiler which is efficient enough so that you can replace traditional languages such as C with Scheme. Because we hoped that we would be able to get decent performance out of our compilers, we thought it could be doable for most applications to replace C with Scheme or languages such as Scheme. This is something rather common today, but 30 years ago it was pretty new because the implementation at the time were not efficient enough for even considering replacing C applications.
Darius: What happened after your PhD?
Manuel: I travelled for a couple of years. I first spent one year in Montreal still studying the compilation of Scheme, and then I moved to Palo Alto, South of San Francisco, doing a post-doc internship at Digital Equipment (DEC). At that time, there were some researchers interested in Scheme at DEC and I’ve being working with them for about a year. I then moved to Geneva, to work with Jan Vitek.
I then switched to another subject with Jan. We have been studying mobile code based on the Java language. It was at the time where people consider that it was possible or interesting to have programs that we are moving from one machine to the other. We tried to develop such systems. I spent a year on this, and for me it mostly meant implementing Java and compiling Java.
I was then recruited as a professor at a French University in Nice where I got back to the implementation of languages and I was teaching compilers for a couple of years.
In 2000 and something, I moved to INRIA, focusing on language design and implementation of high level programming languages.
Manuel: When I joined INRIA, I started to be very interested in programming languages for the Web. We developed a language called Hop which is a multi-tier language. The idea behind Hop is that you will use a single programming language – at the time, it was Scheme – and with that language you will be able to develop the entire Web application: the part that will be running on the server as well as the part that will be running on the client.
We got interesting results with that, because it became much simpler to build Web application using this language. We developed it for years and at some point, we wanted to try to use it for industrial applications. We then realized that Scheme was just a show stopper, it was not possible to use Scheme for industrial applications or even for people that are not computer scientists, because the community of Scheme programmer is too small, which means that you lack resources, you lack examples, you lack tutorials, you lack all these kind of things which are nowadays extremely important.
Darius: I’m an implementation guy, so if you tell me you write a compiler, I’m tempted to ask OK, tell me more, tell me more about your compiler, what techniques do you use? In what language is it written? If you were to educate someone about how your compiler works and how it’s implemented, what would you say?
Darius: My introduction about you know self-reliance was not just a joke, right? You build compilers and then you build new compilers on top of your existing compilers.
Manuel: Yes, absolutely. And the things that I have not mentioned is that my Scheme implementation is not a pure Scheme implementation. I have introduced a lot of extra features on top of Scheme. In particular, I have an interesting object layer that I have designed especially for building compilers and so I have constructs in this Scheme language that let me represent an abstract syntax tree in a very convenient way and implementing walkers on top of that abstract syntax tree extremely easily and in a way that is really extensible.
Building a new optimization for me is just implementing a new traversal of the abstract syntax tree, and generally I do that in just a couple of lines. The Scheme compiler now is so convenient for building compilers that I could not imagine using something else.
Darius: This is me cheating but you know, that’s the upside of us spending time over drinks at conferences. Tell us more about your opinion as per why do you generate C code as opposed to bytecode of some kind or native code directly on an intermediate representation such as LLVM? You generate Scheme. Your Scheme compiler generates C. What’s your attitude regarding that C code generation?
Manuel: The reason why I have been studying C code generation is because from the very first day, I believe that generating C code was a way to get both portability and efficiency because typically C is not the tool or the language that delivers optimal performance on any architecture, but generally speaking you get close to that. C is extremely convenient as a tool for getting this balance between portability and performance.
This is why we have decided to generate C code and I constantly keep asking myself the question should I be generating LLVM code instead? But it’s not so clear that there is something very interesting in that. It is not so clear that generating something more primitive than C I would get extra performance.
The main subject becomes the garbage collector: if you generate assembly code then you are free to choose exactly the garbage collector you want. If you generate C code and if you are generating C code that looks like C code you could write by hand, that is, C code where you see functions and variables, then you have to use specific garbage collectors that are compatible with C, which come with some constraints.
Darius: Could you elaborate on what you would call C code that doesn’t look like C code or doesn’t look like manually written C code?
Darius: I call that idiomatic C code, not just technically C code, but compliant to the idiom of C programmers.
Manuel: Yes, absolutely.
Let’s now get back to the garbage collector. If you are generating code that uses the C stack, you have a problem. How do you find the roots of your garbage collector? The roots that are used to refer to the garbage collector. So, you have to use a garbage collector that will be able to find amongst the C variables which are the ones that are containing pointers, a garbage collector that will be able to walk the C stack, all these kinds of things. This kind of garbage collector exists.
Manuel: There is one which is absolutely excellent, which is called the Boehm collector. It is compatible with all C compilers. It is compatible with multithreaded implementations. It is extremely reliable. It may not be the most efficient in terms of performance and in terms of memory footprint garbage collector on earth, but it is so reliable and delivers such good performance that it is a really a reasonable option to use that collector. And if you do use such a garbage collector, maybe you lose a little of performance because you are not able to use garbage collectors that compacts memory, so probably you behave not as efficiently regarding the hardware cache.
But you have a tremendous advantage, which is that you get their foreign function interface or the connection with other languages for free. Because if you if generate C as if it was plain and native C, there is no difference in between being in your high level language and being in C, so there is no boundaries to traverse when you are calling C. You just remain in the same world and the garbage collector does it for you so you get a foreign function interface for free with this kind of collector.
If you are truly interested in performances, you have to consider both options and you have to consider all the aspects of the choices you are making. Foreign function interfaces is something very important to me because I believe all realistic application at some point use foreign function interfaces extensively.
For instance, if you are using as I did, or I do, a lot of Web applications, then cryptography is extremely important. Then it means that you will keep calling open SSL for instance. What is the cost of calling open SSL if you have to mark objects for the garbage collector and if you have to unmark them at some point. With the Boehm’s collector, you get this for free and so your collection is highly efficient.
Darius: It’s funny because you framed this as saying “I give up performance for interoperability or transparency or lack of boundaries”. I mean the ability to have your function being homogeneous in a non-garbage collected world, let’s put it this way. And while most of the conversations I’ve had or I’ve heard about this are more along the lines of ‘It makes the life of the compiler writer much simpler because there is no need to maintain a precise garbage collection infrastructure’. But the tradeoff is not so much about performance and the way I’ve been told, it’s not about performance, but it’s about the real or perceived risk of having something go wrong because of the intrinsically conservative nature of Boehm’s garbage collector.
Manuel: There are several aspects in your remark. First talking about performance. If you are using an ambiguous collector, at some point you will pay a high price on some particular application.
Let me focus on that for a moment.
The ambiguous collector cannot move objects because it is never sure that something is an integer or something is a pointer, so it has to be conservative and it cannot afford to change an integer value, it keeps pointers where they are.
Because you are not moving objects, you cannot be compacting and in some way, you have to use free lists for allocating values. You are not as efficient as a compacting copying collector. Clearly, if you built a benchmark which extensively relies on the fact that you will be able to reclaim objects very efficiently, and if your objects have a very short lifetime then you will have the feeling that the Boehm’s collector is extremely slow and behaves very poorly.
But this is on very peculiar benchmarks, ones that are allocating a lot of short-lived objects.This, you cannot escape. If you come with a bunch of benchmarks that use this extensively, Boehm will not be an option. This is where we have to pay attention.
This was the main motivation for me. Regarding that perspective, Boehm’s collector is incomparable. It’s It gives you a flexibility that you will never get with something else. This was the main motivation, more, I would say than the simplicity of the compiler itself.
Darius: I would really agree that you can build a benchmark where a conservative garbage collector such as Boehm’s would induce a performance penalty. But similarly, you could build benchmarks where it in fact behaves better. Typically, an application that allocates objects at its inception and then works with its set of allocated object without performing too many additional allocations. Then, by not having the housekeeping code that keeps track of everything, you can actually get better performance. Obviously we’re talking about another set of benchmarks, but these questions are never black and white.
Manuel: Absolutely and then there is this question of which benchmark are you using. how representative they are.
Darius: I guess it also depends on the language. There are languages where allocations are performed mechanically, where allocations are performed just because of the semantics of the language as opposed to languages where allocations are explicit operations by the developer and then the developer can control that aspect of the behavior of the system.
Manuel: The focus of most of the studies regarding functional languages implementation has been to remove these implicit allocations. Typically, this is what we have tried very hard for 20 or 30 years, just to allocate only when the programmer explicitly allocates. When I started my PhD, there were no implementation of Scheme that were able to execute Scheme code without allocating for the execution of the control flow, and this is what I did in my PhD: creating techniques to get rid of these heap allocations and replacing them with stack allocation as we do for C, for instance.
So nowadays, typically my Scheme compiler allocates only when you explicitly create an object. By itself, it no longer allocates. This is why if you consider a Scheme program and if you compare its performance with its equivalent C code, if you do not use a lot of heap allocation, then the Scheme code behaves as efficiently as the C code.
Darius: How do you describe what you do when you meet people outside the computing world, when you people ask “What do you do for a living?” What do you then tell them?
Manuel: I present myself as a researcher. I think it’s easier for people to have an intuition of what I’m doing, if I present myself as a mathematician instead of a computer scientist, because as soon as you pronounce the word computer science, people imagine that you keep tweaking the machine and being a hacker, developing various tools or something like that.
Darius: They hear computer, they forget science.
Manuel: Exactly. And if I consider my daily life, I built analysis, I built formalisms to describe programming languages. I rely on mathematics for that. As soon as we are talking about semantics, for instance, we only get to the logic and things similar to that. We build theorems, sometimes we demonstrate them.
Although I believe that computer science is mathematics, there is something absolutely specific to computer science, I think that if I say mathematician, people have a better understanding of what I’m doing. If I’m talking to non-expert people, I really describe myself as a mathematician. And if I’m talking with people that are familiar with computer science, then I just present my work on compilers as the problem of translating from one language to the other, as you could be doing with natural languages. Except that, and I insist, programming languages are non-ambiguous, which changes the game of the translation a lot.
Darius: What would you do if you weren’t involved in language compilers?
Manuel: So, you mean if I was not involved in compilers but involved in computer science?
Darius: Assuming that you know you still have to make a living, you know how computer works, If you weren’t involved in programming languages and compilers, what would be your topic of interest?
Manuel: Probably operating systems. I think there are lot of things to be done in the world of operating system, and I think people are doing things that are extremely interesting and so this might be one option. The other is that I have a true fascination for computer hardware design. I think the people that are building processors are incredibly smart. I cannot believe that they are able to optimize and to improve processors after processors and they are relying on very smart technology, very smart science, very smart algorithms, so I find this world extremely fascinating. I would probably be playing in this game.
Darius: In your fantasy world you would push self-reliance to the point where you program your system using your own compilers for your own languages on your own operating system and your own hardware.
Manuel: Could be, it could be an option, yes.
Darius: Now I’m officially scared off
What would you do if you weren’t involved in computing at all, if computing was off the table for some reason?
Manuel: I don’t really know. I have never hesitated between several possible careers. I knew a long time ago that I wanted to do computer science. But I would probably be doing something like a professional for climbing or teaching people how to climb, things like that, because this is another big thing in my life. Either I’m programming or I’m rock climbing.
Darius: And I understand the rationale, living where you live, so close to the Alps.
Manuel: Absolutely, but this is no accident, this was a decision, I left Paris for the mountains.
Darius: How old were you when you wrote code for the first time?
Manuel: I remember that because I received my first computer for my 12th birthday, so I started programming when I was 12 years old, and I used to program in BASIC on a very tiny computer, a pocket computer at the time, and it was a lot of fun. And this is the moment where I think I fell in love with computer science.
Darius: Tell us more about the experience because it obviously is a different world. There’s nothing that that is similar to those kinds of devices nowadays.
Manuel: This was a long time ago and it was a moment where computer science still had to be invented. Everything was new. It was just a kind of a new territory, and it was fascinating because of that. These computers were extremely interesting for learning because they were extremely small. If I remember correctly, my first computer had something like one kilobyte or one kilobyte and a half. At first, you had to program it in BASIC, it was extremely restricted, and a very good thing when you have a very constrained environment is that you have to be very innovative and imaginative.
It takes a lot of imagination for doing fun things on this kind of constrained environment. And the other interesting thing is that because it was so small, it was possible to know almost everything about it. And I knew almost all of my computer. At some point I knew even by heart all the memory and all the operating system that was implemented in assembly on that machine, and this was a lot of fun for learning computer science. It was a great tool, a fascinating tool, and in some sense, I regret that young people today cannot benefit from such interesting environments any longer. Everything is so complicated. Implementing a true application is so complex on a modern computer or even worse on a phone.
I imagine that for young people it’s difficult to imagine that it can do something significant until a long period of learning and exercising and things like that. The period was actually very fascinating I think, and I had a great chance benefiting of that.
Darius: One of my favorite questions, because we hear so many different answers to that is what software project would you have loved to be part of?
Manuel: One of the application software that impressed me the most is definitely Linux. I think this is a tremendous project for various reasons. First, because technically speaking, it is of incredibly high quality. But also, for the human adventure it represents. A collaborative project pushed at that level, with so many people involved, the fact that it escaped the traditional point of view that we had on building industrial things, where at some point you have to rely on an industrial partner which probably will make a high profit out of that, which is really out of the radar of Linux. Linux is built on totally different basis, and I find this extremely fascinating. I would have loved to be involved in Linux, but as I said, I have no competence in operating systems, at least not yet.
Darius: do you consider the most important quality for someone in our trades, doing the kind of work we’re doing.
Manuel: You have to be extremely rigorous, and you have to be hard worker. Programming and programming well really takes a lot of exercise, a lot of practice. Probably a little bit like if you are practicing a musical instrument. You have to read code. You have to practice all the time, so it means that you have to work long hours. Being able to be a hard worker and extremely rigorous person, I think these are the two most important skills.
Darius: You’re essentially concurring with Malcolm Gladwell, who says that to be an expert in something you have to spend 10,000 hours on it, which kind of assumed that if I spend 10,000 hours on ballet dancing, I would be competent, which is a stretch, I appreciate. Rigor is a very obvious and generic quality, but other than that it’s just grit and keeping doing it.
Manuel: Yes, but you know, I think that when you are programming, you have to keep improving your own application. You have to keep improving your own source code, so you have to be working on your source code over and over again. I once attended a talk given by Richard Stallman and he said something that I have found extremely striking. He said that when someone reported a bug in GCC, either he is able to fix the bug within five minutes, or he rewrites the code. And I believe that extremely deeply. You have to keep working on the same source code all over again and so it means that you have to be working very hard. And actually, yes, this is what I do every day. Every day, I change something, there is no day when I do not program, ever.
Darius: You are like Stravinsky who said he cannot spend a day without writing music.
Manuel: Yes, something like that.
Darius: What do you consider your most important professional or technical or scientific quality? What do you think makes you the best at what you do?
Darius: It is an ecosystem. You have to live in it.
Darius: What do you consider your most important professional, technical, scientific flaw? What do you think you’re really not that good at?
Manuel: In the career I’m following, which is a career for academic people, it’s important to be able to write papers more efficiently than what I’m able to do. I so deeply prefer programming to writing papers, and this is probably my main flaw: not being able to write papers more efficiently.
Darius: If there was one language you wish never existed, one language which you could erase from the planet, which would that be? If you could choose a language, let’s say I have a magic wand and I can remove this language from the planet Earth, which language would you choose?
Manuel: I would not erase a language per se, but I would remove an API. I extremely strongly dislike the Android Java Ecosystem. It’s so weird, so strangely designed that I would really remove this system and replace it with something else.
Believe it or not, I’m porting my compilers on all systems and I’m porting my system on all systems, and I have a port for Android, Hop is running on Android, but the thing is that Java on Android is so complex that believe it or not, there is one basic functionality that I’m not able to implement correctly on that system and that is the termination of the system itself!
That is, I’m not able to implement exit() correctly on Android. I don’t know how to do that. This tells you a little bit about the complexity of the system and the design of the system. You know the beauty of Unix is that you have few concepts that you can combine. Android is exactly the opposite of that. Everything is in everything. Everything has implications on everything. So, for programming in this world is awful for me. Java Android is really the thing I would like to have never existed.
Darius: It is funny because there is that thread across everything you say that starts with self-reliance and when people get in the way of that self-reliance, you’re frustrated. I mean, you really want to be able to be the master of your destiny, which is really a common theme in everything you say.
If you were to direct our audience to read one book, one article, one author, computing-related or not, what would that be?
Manuel: There is an awesome, gorgeous book about computer science and maybe my favorite is by Patterson and Hennessy – Computer Architecture, Quantitative Design. It’s so beautiful and it is so convincing that computer science is really a science of its own. So, it’s absolutely gorgeous. I think all computer scientists should read this book.
Darius: Guilty as charged. Promised, I will read it and get back to you.
Manuel: It is almost a book about compilers, because it tells you about the instructions the compiler will generate, and why the compiler should have to generate this very instruction. It’s really about the design of the instruction set of your computer, so it’s gorgeous. It’s so well written, it’s gorgeous.
Darius: In the same vein, what is the book you wished you could force yourself to forget so that you can read it, over and over again?
Manuel: I have learned a lot about programming with a book called Structure and Interpretation of Computer Program by Sussman and Abelson. And it’s gorgeous and once again it’s another book that tells you why programming is a science and it’s so beautiful. It’s so smart that reading it is just an eternal pleasure.
Darius: Eternal is the word, so you would love to be able to read it again and again and never get bored because you forget every time.
Darius: Great! what would you like to be remembered for?
Manuel: My compilers, the fact that I’ve been able to build various compilers, and that most of these compilers are efficient. Maybe, not the most efficient but amongst the most efficient and that I’ve been able to build all these compilers by myself.
Darius: Since you are a builder then, and really intrinsically you’re a builder, what are your plans for the coming five years? What do you wish to be able to build for the coming, say 5 or 10 years?
Darius: Manuel, thank you so much. You’ve been a wonderful and graceful guest. Thank you for attending ‘We Speak Your Language’ and we’ll be talking soon. Take care.
Manuel: Yes, thank you very much Darius for the invitation and for this interesting conversation that we have just had. Thank you so much.
The Boehm–Demers–Weiser garbage collector: https://www.hboehm.info/gc/
Patterson and Henessy: https://www.elsevier.com/books/computer-architecture/hennessy/978-0-12-811905-1