Torna alla pagina precedente di unitn

Testo integrale in lingua inglese

When designing a programming language, one has to take into consideration many goals, among which performance, usability, and mathematical rigour. It is often easy to concentrate on a subset of them while neglecting the rest. What, in your opinion, are the most commonly overlooked goals, or the most common pitfalls, in programming language design?

The most commonly overlooked goal is simply to treat computer programming as a human activity. We are linguistic creatures, and spend much of the day communicating in human language. Our native tongues are expressive, and our use of them is often driven by motivations external to the immediate utterance. To properly analyze human language, one must pay attention to many levels of context. It is easy for a computer language designer to fall into the trap of thinking we just have to think like computers, and not consider ways in which we can make the computer think more like we do. Providing more than one way to express an idea in a computer language runs against the mathematical idea of "orthogonality", but it opens up the expressivity of the language to be able to respond to the external contexts that are of interest the programmer. In addition to optimizing for performance or usability (whatever that is), one might want to optimize for resource consumption, for reliability, for teachability, for maintainability, for conciseness, for elegance, or for brute force. In one context you might want to write readable code, but that is explicitly not the goal when you're entering an obfuscated code contest. So languages should respond well to externalities.

Human languages are also used by many different people of differing abilities. Our languages are rich enough that we can express complicated ideas concisely. Some people are poets, others are just using their language to get the job done, or to swear about not getting it done. Similarly, a good computer language will also flow smoothly around the problem space, allowing different kinds of people to solve different kinds of problems without inducing unnecessary complexity in the solution space. We call this a good "impedance match".

Expressed in terms of pitfalls, the typical language designer too often thinks of their language in purely technical terms, and perhaps the direct cultural effects, but tends not to think about 2nd and 3rd order cultural effects, where people start helping each other in communities and contributing back to the development of the language and of the culture itself. For that reason we say that Italian is a living language, but Latin is a dead language. I think the Perl community is just as important as the Perl language.

There are many other traps that language designers often fall into.
It's too easy to settle on the first idea you think of, and not consider whether there might be a better way.
It's too easy to take a single theoretical idea and force everything to be viewed through that lens. As the saying goes, "If all you have is a hammer, everything starts to look like nail." Beware of languages that claim "Everthing is a ___." Everything is not an object. Everything is not a function. Everything is not an assertion, or list, or event.
It's too easy to "hang your coat on the wrong peg", that is, to make globals that should be dynamics, or make dynamics that should be lexicals, or make lexicals that should belong to an object, or to keep your computational state in objects when it should be reflected in the call graph. Or the other way, it's easy to make all the opposite mistakes as well.
It's too easy to think that if you make your language simple, it simplifies matters for the programmer, when in fact, if your language is too simple for the problem you're trying to solve, it complicates the job of both the library programmer and the application programmer.
It's too easy to assume that your programmers are all geniuses, or all idiots. You'll have some of both, and many in the middle.
It's too easy to think that your language will only be used for small programs, or large ones.
It's too easy to think your language will only be used by one culture.
It's too easy to think that your users must learn the entire language at once, when most human languages take years to learn.
It's too easy to force programmers to think the same way you do; instead, give them the opportunity to learn how you think when they are ready for it.

The current trends in hardware design suggest that future hardware will be more and more parallel. Do you think that programming languages should try to include features so to allow the programmer to easily exploit the new parallel architectures? If so, to which extent?

Absolutely. But it depends on what you mean by "easily". Some languages have parallel features that are not at all easy to use, because other features interfere due to insufficient decoupling. Perl 5, for instance, has too many global variables to do threading properly, so we designed Perl 6 with far fewer globals.

The big temptation is to think you've solved the parallelism problem when you've just added one feature, such as threads, or pipelines, or vector processing. There are many different kinds of parallelism suitable for different kinds of problems, so the "one size fits all" approach doesn't really work well unless you pick your problems more carefully than your users will.

In the design of Perl 6, there are many ways in which we've considered the future of parallel processing. Fundamentally, the trick is to invent notations natural to the problem domain, such that the programmer can express parallel ideas simply without introducing unwanted dependencies or sequentialities. That's easy to say, but hard to carry out without rethinking a lot of the received wisdom of the ages.

For instance, the typical model of exception handling via control flow (unwinding the stack) tends to be bad for any kind of vector processing. You might have 999 good sensor readings, but the one bad reading blows up the rocket because your exception model depends on treating the error handling as control flow when it should be considered a form of undefined data. Hence Perl 6 tends to treat exceptions as data, and only throws exceptions lazily if you never check the value for validity yourself.

We've thought about the balance of active vs passive data, since passive data can be immutable, and easily copied, while active data has to keep shared state. So active data parallelism maps well to an object-oriented style, while passive data parallelism maps better to the functional programming style.

We have various constructs in Perl 6 that make promises about parallelizability, and refrain from introducing any unnecessary sequentiality. Most normal operators can be turned into vector operators, which we call "hyper-operators". These operators promise that you don't care what order the operations are done in as long as the results come back in the correct order. This maps naturally onto the operations of an array processor such as a GPU.

There are other operators that work more like pipelines in Unix, or event queues in, say Erlang. These operators promise that the source and the sink of data don't interfere with each other in non-deterministic ways (or that if they are non-deterministic, you don't care). At a more fundamental level, all sequential list processing is done lazily in Perl 6, so even though things have to be done in a particular order, the various computations may be pipelined across multiple processors. However, unlike in some lazy languages, Perl's lazy lists do not promise strict laziness by default, since it may be worthwhile to work ahead in batches depending how many cores you have and how your scheduler works.
Whether lists are lazy or eager, it's important that a language not force a concrete view of list data that implies sequential processing. Many functional programming languages fall into this trap by encouraging the programmer to define list algorithms in terms of a head element followed by a tail list. Instead, list composition logic needs to be abstracted out from operations on individual elements.

There are other considerations when getting into more statistical forms of parallel processing, such as map/reduce algorithms. How much indeterminacy is too much? How do you know when you've collected enough good results to start telling the user something? When do we know enough to buzz in on the next Jeopardy question like Watson? The ideal situation is when we can just tell our computers what we want to know, and let them figure out how to figure it out.

In many fields, people are starting to use domain-specific languages (DSL) for describing and analyzing complex systems. The design process for a DSL has to target a smaller community of users, and focus on the key aspects of the domain at hand. Still, it shares many goals with the design of general-purpose languages. What is your opinion on the subject? Do you believe some aspects of the design process become more important in DSL design?

Obviously, some amount of domain specific knowledge becomes more important, or the DSL won't be any more useful than a general language would. But there are two ways things can go wrong here. If designed by a domain expert, the language may be designed poorly as a language. Contrariwise, if designed by a language expert, the language may be designed poorly for the domain. There are really two domains here, so you need to have expertise in both.

Unfortunately, most languages designers are a bit unbalanced, and tend to overreact to perceived problems. This cuts both ways. A DSL is usually designed because of the perceived failure of general languages. On the other hand, general languages often grow out of the perceived limitations of domain specific languages.

There are two ways out of this. One is to encourage the existence of many small languages that have a specific domain. The other is to find some way to pretend that a general language is domain-specific. Bell Labs originally encourage the first approach, but it turned out that the "do one thing and do it well" philosophy didn't quite work, because nobody quite knew what the "one thing" was, and in any case many of the tools that did one thing didn't do it well.

There are two other problems with the bottom-up approach. You have to find some way of intercommunicating between tools written in different languages, which is difficult, especially in the presence of active data (objects) that require transactional consistency. Another big problem is that each small language ends up running into problems that are very difficult or impossible to solve, simply because the language designer did not imagine their language being used that way.

So I tend to prefer the top-down approach of creating a DSL by subsetting and extending a general language. (This presumes, of course, that the general language is mutable enough to be bent into the desired shape of the DSL, so one of the explicit design goals of Perl 6 is to be that kind of mutable language.)

There are also several possible failure modes of the top-down approach. If you choose to hide the general parts of your language, you might hide them so well that the user cannot get to them when they need the general features. ("Easy things should be easy, and hard things should be possible.") Or if you do make the general features available to your DSL through some kind of escape hatch, the user might fall through the hatch accidentally and not be able to understand the resulting error message, unless they also understand the general language on which the DSL is based.

If your DSL is either more or less purist than the host language in terms of either FP or OO, you'll have to deal with that mismatch of semantic models. (It helps to have a multi-paradigmatic language for your host language.)
In one way it may be easier to design a DSL than a general language, because you have a much better idea of who your specific audience is and what they want to do, and how well-equipped they are to think about it. Similarly, since the problem domain is limited, you'll have a better idea which parts of the process need to be highly optimized, and which parts you don't have to care about.

With a general language, you can't know as much about which language features the programmer will choose to use in the middle of their tightest loop. You have to guess about what the typical user might want to do, and optimize a little bit everywhere. With most performance enhancement, the typical advice is to use a profiler to see where the hot spots are, but when you're inventing a general language or its implementation, you don't have the user's code to profile against because it hasn't been written yet!

In any case, language design is both an art and a science. There are many wonderful principles at work, but there is no principle that tells you in advance which principle you should be paying attention to first. So you just have to go by your gut-level feelings much of the time, and be prepared to be told frequently how and why your gut feelings are wrong. If this is your idea of fun, you too can be a language designer.

Versione stampabile