=encoding utf-8

TITLE

Apocalypse 2: Bits and Pieces

AUTHOR

Larry Wall <larry@wall.org>

VERSION

  Maintainer: Larry Wall <larry@wall.org>
  Date: 3 May 2001
  Last Modified: 18 May 2006
  Number: 2
  Version: 6

Here's Apocalypse 2, meant to be read in conjunction with Chapter 2 of the Camel Book. The basic assumption is that if Chapter 2 talks about something that I don't discuss here, it doesn't change in Perl 6. (Of course, it could always just be an oversight. One might say that people who oversee things have a gift of oversight.)

Before I go further, I would like to thank all the victims, er, participants in the RFC process. (I beg special forgiveness from those whose brains I haven't been able to get inside well enough to incorporate their ideas). I would also like to particularly thank Damian Conway, who will recognize many of his systematic ideas here, including some that have been less than improved by my meddling.

Here are the RFCs covered:

    RFC  PSA  Title
    ---  ---  -----
      Textual
    005  cdr  Multiline Comments for Perl
    102  dcr  Inline Comments for Perl

      Types
    161  adb  Everything in Perl Becomes an Object
    038  bdb  Standardise Handling of Abnormal Numbers Like Infinities and NaNs
    043  bcb  Integrate BigInts (and BigRats) Support Tightly With the Basic Scalars
    192  ddr  Undef Values ne Value
    212  rrb  Make Length(@array) Work
    218  bcc  C<my Dog $spot> Is Just an Assertion

      Variables
    071  aaa  Legacy Perl $pkg'var Should Die
    009  bfr  Highlander Variable Types
    133  bcr  Alternate Syntax for Variable Names
    134  bcc  Alternative Array and Hash Slicing
    196  bcb  More Direct Syntax for Hashes
    201  bcr  Hash Slicing

      Strings
    105  aaa  Remove "In string @ must be \@" Fatal Error
    111  aaa  Here Docs Terminators (Was Whitespace and Here Docs)
    162  abb  Heredoc Contents
    139  cfr  Allow Calling Any Function With a Syntax Like s///
    222  abb  Interpolation of Object Method Calls
    226  acr  Selective Interpolation in Single Quotish Context
    237  adc  Hashes Should Interpolate in Double-Quoted Strings
    251  acr  Interpolation of Class Method Calls
    252  abb  Interpolation of Subroutines
    327  dbr  C<\v> for Vertical Tab
    328  bcr  Single Quotes Don't Interpolate \' and \\

      Files
    034  aaa  Angle Brackets Should Not Be Used for File Globbing
    051  ccr  Angle Brackets Should Accept Filenames and Lists

      Lists
    175  rrb  Add C<list> Keyword to Force List Context (like C<scalar>)

      Retracted
    010  rr  Filehandles Should Use C<*> as a Type Prefix If Typeglobs Are Eliminated
    103  rr  Fix C<$pkg::$var> Precedence Issues With Parsing of C<::>
    109  rr  Less Line Noise - Let's Get Rid of @%
    245  rr  Add New C<empty> Keyword to DWIM for Clearing Values
    263  rr  Add Null() Keyword and Fundamental Data Type

Atoms

Perl 6 programs are notionally written in Unicode, and assume Unicode semantics by default even when they happen to be processing other character sets behind the scenes. Note that when we say that Perl is written in Unicode, we're speaking of an abstract character set, not any particular encoding. (The typical program will likely be written in UTF-8 in the West, and in some 16-bit character set in the East.)

Molecules

RFC 005: Multiline Comments for Perl

I admit to being prejudiced on this one -- I was unduly influenced at a tender age by the rationale for the design of Ada, which made a good case, I thought, for leaving multiline comments out of the language.

But even if I weren't blindly prejudiced, I suspect I'd look at the psychology of the thing, and notice that much of the time, even in languages that have multiline comments, people nevertheless tend to use them like this:

    /*
     *  Natter, natter, natter.
     *  Gromish, gromish, gromish.
     */

The counterargument to that is, of course, that people don't always do that in C, so why should they have to do it in Perl? And if there were no other way to do multiline comments in Perl, they'd have a stronger case. But there already is another way, albeit one rejected by this RFC as "a workaround."

But it seems to me that, rather than adding another kind of comment or trying to make something that looks like code behave like a comment, the solution is simply to fix whatever is wrong with POD so that its use for commenting can no longer be considered a workaround. Actual design of POD can be put off till Apocalypse 26, but we can speculate at this point that the rules for switching back and forth between POD and Perl are suboptimal for use in comments. If so, then it's likely that in Perl 6 we'll have a rule like this: If a =begin MUMBLE transitions from Perl to POD mode then the corresponding =end MUMBLE should transition back (without a =cut directive).

Note that we haven't defined our MUMBLEs yet, but they can be set up to let our program have any sort of programmatic access to the data that we desire. For instance, it is likely that comments of this kind could be tied in with some sort of literate (or at least, semiliterate) programming framework.

RFC 102: Inline Comments for Perl

I have never much liked inline comments -- as commonly practiced they tend to obfuscate the code as much as they clarify it. That being said, "All is fair if you predeclare." So there should be nothing preventing someone from writing a lexer regex that handles them, provided we make the lexer sufficiently mutable. Which we will. (As it happens, the character sequence "/*" will be unlikely to occur in standard Perl 6. Which I guess means it is likely to occur in nonstandard Perl 6. :-)

A pragma declaring nonstandard commenting would also allow people to use /* */ for multiline comments, if they like. (But I still think it'd be better to use POD directives for that, just to keep the text accessible to the program.)

[Update: It eventually became apparent (after five years!) that we could simplify the distinction between postfix and infix operators if we had a general way to embed comments, so we now have a general quote-like mechanism for embedded comments such that you can say $foo\#(bar).baz to mean $foo.baz. Basically, if the # character is immediately followed by a bracket, that bracket pair determines the scope of the comment. (If you're wondering how the backslash/dot become one dot in the example, see the explanation of the "long dot" in S02.)]

Built-In Data Types

The basic change here is that, rather than just supporting scalars, arrays and hashes, Perl 6 supports opaque objects as a fourth fundamental data type. (You might think of them as pseudo-hashes done right.) While a class can access its object attributes any way it likes, all external access to opaque objects occurs through methods, even for attributes. (This guarantees that attribute inheritance works correctly.)

While Perl 6 still defaults to typeless scalars, Perl will be able to give you more performance and safety as you give it more type information to work with. The basic assumption is that homogenous data structures will be in arrays and hashes, so you can declare the type of the scalars held in an array or hash. Heterogenous structures can still be put into typeless arrays and hashes, but in general Perl 6 will encourage you to use classes for such data, much as C encourages you to use structs rather than arrays for such data.

One thing we'll be mentioning before we discuss it in detail is the notion of "properties." (In Perl 5, we called these "attributes," but we're reserving that term for actual object attributes these days, so we'll call these things "properties.") Variables and values can have additional data associated with them that is "out of band" with respect to the ordinary typology of the variable or value. For now, just think of properties as a way of adding ad hoc attributes to a class that doesn't support them. You could also think of it as a form of class derivation at the granularity of the individual object, without having to declare a complete new class.

[Update: We're now calling compile-time properties "traits". And objects don't really have properties separate from their attributes--this is now handled with a mixin mechanism.]

RFC 161: Everything in Perl Becomes an Object.

This is essentially a philosophical RFC that is rather short on detail. Nonetheless, I agree with the premise that all Perl objects should act like objects if you choose to treat them that way. If you choose not to treat them as objects, then Perl will try to go along with that, too. (You may use hash subscripting and slicing syntax to call attribute accessors, for instance, even if the attributes themselves are not stored in a hash.) Just because Perl 6 is more object-oriented internally, does not mean you'll be forced to think in object-oriented terms when you don't want to. (By and large, there will be a few places where OO-think is more required in Perl 6 than in Perl 5. Filehandles are more object-oriented in Perl 6, for instance, and the special variables that used to be magically associated with the currently selected output handle are better specified by association with a specific filehandle.)

RFC 038: Standardise Handling Of Abnormal Numbers Like Infinities and NaNs

This is likely to slow down numeric processing in some locations. Perhaps it could be turned off when desirable. We need to be careful not to invent something that is guaranteed to run slower than IEEE floating point. We should also try to avoid defining a type system that makes translation of numeric types to Java or C# types problematic.

That being said, standard semantics are a good thing, and should be the default behavior.

RFC 043: Integrate BigInts (and BigRats) Support Tightly With the Basic Scalars

This RFC suggests that a pragma enables the feature, but I think it should probably be tied to the run-time type system, which means it's driven more by how the data is created than by where it happens to be stored or processed. I don't see how we can make it a pragma, except perhaps to influence the meaning of "int" and "num" in actual declarations further on in the lexical scope:

    use bigint;
    my int $i;

might really mean

    my bigint $i;

or maybe just

    my int $i is bigint;

since representation specifications might just be considered part of the "fine print." But the whole subject of lexically scoped variable properties specifying the nature of the objects they contain is a bit problematic. A variable is a sort of mini-interface, a contract if you will, between the program and the object in question. Properties that merely influence how the program sees the object are not a problem -- when you declare a variable to be constant, you're promising not to modify the object through that variable, rather than saying something intrinsically true about the object. (Not that there aren't objects that are intrinsically constant.)

Other property declarations might need to have some say in how constructors are called in order to guarantee consistency between the variable's view of the object, and the nature of the object itself. In the worst case we could try to enforce consistency at run time, but that's apt to be slow. If every assignment of a Dog object to a Mammal variable has to check to see whether Dog is a Mammal, then the assignment is going to be a dog.

So we'll have to revisit this when we're defining the relationship between variable declarations and constructors. In any event, if we don't make Perl's numeric types automatically promote to big representations, we should at least make it easy to specify it when you want that to happen.

[Update: The Int type automatically upgrades to arbitrary precision internally. The int type does not.]

RFC 192: Undef Values ne Value

I've rejected this one, because I think something that's undefined should be considered just that, undefined. I think the standard semantics are useful for catching many kinds of errors.

That being said, it'll hopefully be easy to modify the standard operators within a particular scope, so I don't think we need to think that our way to think is the only way to think, I think.

RFC 212: Make `length(@array)` Work

Here's an oddity, an RFC that the author retracted, but that I accept, more or less. I think length(@array) should be equivalent to @array.length(), so if there's a length method available, it should be called.

The question is whether there should be a length method at all, for strings or arrays. It almost makes more sense for arrays than it does for strings these days, because when you talk about the length of a string, you need to know whether you're talking about byte length or character length. So we may split up the traditional length function into two, in which case we might end up with:

    $foo.chars
    $foo.bytes
    @foo.elems

Or some such. Whatever the method names we choose, differentiating them would be more powerful in supplying context. For instance, one could envision calling @foo.bytes to return the byte length of all the strings. That wouldn't fly if we overloaded the method name.

Even chars($foo) might not be sufficiently precise, since, depending on how you're processing Unicode, you might want to know how long the string is in actual characters, not counting combining characters that don't take extra space. But that's a topic for later.

[Update: There is no length function. There are bytes, codes, graphs, and langs methods for the various Unicode support levels. (The chars method returns one of those values depending on the current Unicode support level.) Arrays and hashes report number of elements with the elems method.]

RFC 218: `my Dog $spot` Is Just an Assertion

I expect that a declaration of the form:

    my Dog $spot;

is merely an assertion that you will not use $spot inconsistently with it being a Dog. (But I mean something different by "assertion" than this RFC does.) This assertion may or may not be tested at every assignment to $spot, depending on pragmatic context. This bare declaration does not call a constructor; however, there may be forms of declaration that do. This may be necessary so that the variable and the object can pass properties back and forth, and in general, make sure they're consistent with each other. For example, you might declare an array with a multidimensional shape, and this shape property needs to be visible to the constructor, if we don't want to have to specify it redundantly.

On the other hand, we might be able to get assignment sufficiently overloaded to accomplish the same goal, so I'm deferring judgment on that. All I'm deciding here is that a bare declaration without arguments as above does not invoke a constructor, but merely tells the compiler something.

[Update: The constructor may be called using the .=new() construct.]

Other Decisions About Types

Built-in object types will be in all uppercase: INTEGER, NUMBER, STRING, REF, SCALAR, ARRAY, HASH, REGEX and CODE. Corresponding to at least some of these, there will also be lowercase intrinsic types, such as int, num, str and ref. Use of the lowercase typename implies you aren't intending to do anything fancy OO-wise with the values, or store any run-time properties, and thus Perl should feel free to store them compactly. (As a limiting case, objects of type bit can be stored in one bit.) This distinction corresponds roughly to the boxed/unboxed distinction of other computer languages, but it is likely that Perl 6 will attempt to erase the distinction for you to the extent possible. So, for instance, an int may still be used in a string context, and Perl will convert it for you, but it won't cache it, so the next time you use it as a string, it will have to convert again.

[Update: The object types are no longer all caps, but Int, Num, Str, etc.]

The declared type of an array or hash specifies the type of each element, not the type of an array or hash as a whole. This is justified by the notion that an array or hash is really just a strange kind of function that (typically) takes a subscript as an argument and returns a value of a particular type. If you wish to associate a type with the array or hash as a whole, that involves setting a tie property. If you find yourself wishing to declare different types on different elements, it probably means that you should either be using a class for the whole heterogenous thing, or at least declare the type of array or hash that will be a base class of all the objects it will contain.

Of course, untyped arrays and hashes will be just as acceptable as they are currently. But a language can only run so fast when you force it to defer all type checking and method lookup till run time.

The intent is to make use of type information where it's useful, and not require it where it's not. Besides performance and safety, one other place where type information is useful is in writing interfaces to other languages. It is postulated that Perl 6 will provide enough optional type declaration syntax that it will be unnecessary to write XS-style glue in most cases.

[Update: Turns out one of the most important reasons for adding type information is that it allows for multimethod dispatch.]

Variables

RFC 071: Legacy Perl $pkg'var Should Die

I agree. I was unduly influenced by Ada syntax here, and it was a mistake. And although we're adding a properties feature into Perl 6 that is much like Ada's attribute feature, we won't make the mistake of reintroducing a syntax that drives highlighting editors nuts. We'll try to make different mistakes this time.

RFC 009: Highlander Variable Types

I basically agree with the problem this RFC is trying to solve, but I disagree with the proposed solution. The basic problem is that, while the idiomatic association of $foo[$bar] with @foo rather than $foo worked fine in Perl 4, when we added recursive data structures to Perl 5, it started getting in the way notationally, so that initial funny character was trying to do too much in both introducing the "root" of the reference, as well as the context to apply to the final subscript. This necessitated odd looking constructions like:

    $foo->[1][2][3]

This RFC proposes to solve the dilemma by unifying scalar variables with arrays and hashes at the name level. But I think people like to think of $foo, @foo and %foo as separate variables, so I don't want to break that. Plus, the RFC doesn't unify &foo, while it's perfectly possible to have a reference to a function as well as a reference to the more ordinary data structures.

So rather than unifying the names, I believe all we have to do is unify the treatment of variables with respect to references. That is, all variables may be thought of as references, not just scalars. And in that case, subscripts always dereference the reference implicit in the array or hash named on the left.

This has two major implications, however. It means that Perl programmers must learn to write @foo[1] where they used to write $foo[1]. I think most Perl 5 people will be able to get used to this, since many of them found the current syntax a bit weird in the first place.

The second implication is that slicing needs a new notation, because subscripts no longer have their scalar/list context controlled by the initial funny character. Instead, the context of the subscript will need to be controlled by some combination of:

1. Context of the entire term.
2. Appearance of known list operators in the subscript, such as comma or range.
3. Explicit syntax casting the inside of the subscript to list or scalar context.
4. Explicit declaration of default behavior.

One thing that probably shouldn't enter into it is the run-time type of the array object, because context really needs to be calculated at compile time if at all possible.

In any event, it's likely that some people will want subscripts to default to scalars, and other people will want them to default to lists. There are good arguments for either default, depending on whether you think more like an APL programmer or a mere mortal.

[Update: Rvalue subscripts are always list context, but it's trivial to force scalar context with either of the + or ~ unary operators. Lvalue subscripts are scalar context unless the lvalue is in parentheses.]

There are other larger implications. If composite variables are thought of as scalar references, then the names @foo and %foo are really scalar variables unless explicitly dereferenced. That means that when you mention them in a scalar context, you get the equivalent of Perl 5's \@foo and \%foo. This simplifies the prototyping system greatly, in that an operator like push no longer needs to specify some kind of special reference context for its first argument -- it can merely specify a scalar context, and that's good enough to assume the reference generation on its first argument. (Of course, the function signature can always be more specific if it wants to. More about that in future installments.)

There are also implications for the assignment operator, in that it has to be possible to assign array references to array variables without accidentally invoking list context and copying the list instead of the reference to the list. We could invent another assignment operator to distinguish the two cases, but at the moment it looks as though bare variables and slices will behave as lvalues just as they do in Perl 5, while lists in parentheses will change to a binding of the right-hand arguments more closely resembling the way Perl 6 will bind formal arguments to actual arguments for function calls. That is to say,

    @foo = (1,2,3);

will supply an unbounded list context to the right side, but

    (@foo, @bar) = (@bar, @foo)

will supply a context to the right side that requests two scalar values that are array references. This will be the default for unmarked variables in an lvalue list, but there will be an easy way to mark formal array and hash parameters to slurp the rest of the arguments with list context, as they do by default in Perl 5.

(Alternately, we might end up leaving the ordinary list assignment operator with Perl 5 semantics, and define a new assignment operator such as := that does signatured assignment. I can argue that one both ways.)

[Update: We ended up with a := binding operator.]

Just as arrays and hashes are explicitly dereferenced via subscripting (or implicitly dereferenced in list context), so too functions are merely named but not called by &foo, and explicitly dereferenced with parentheses (or by use as a bare name without the ampersand (or both)). The Perl 5 meanings of the ampersand are no longer in effect, in that ampersand will no longer imply that signature matching is suppressed -- there will be a different mechanism for that. And since &foo without parens doesn't do a call, it is no longer possible to use that syntax to automatically pass the @_ array -- you'll have to do that explicitly now with foo(@_).

Scalar variables are special, in that they may hold either references or actual "native" values, and there is no special dereference syntax as there is for other types. Perl 6 will attempt to hide the distinction as much as possible. That is, if $foo contains a native integer, calling the $foo.bar method will call a method on the built-in type. But if $foo contains a reference to some other object, it will call the method on that object. This is consistent with the way we think about overloading in Perl 5, so you shouldn't find this behavior surprising. It may take special syntax to get at any methods of the reference variable itself in this case, but it's OK if special cases are special.

[Update: The variable($foo) pseudo-function allows you to specify the container rather than the contained object.]

RFC 133: Alternate Syntax for Variable Names

This RFC has a valid point, but in fact we're going to do just the opposite of what it suggests. That is, we'll consider the funny characters to be part of the name, and use the subscripts for context. This works out better, because there's only one funny character, but many possible forms of dereferencing.

[Update: Nowadays we call those funny characters sigils. And for weirdly scoped variables there's a second character called a twigil.]

RFC 134: Alternative Array and Hash Slicing

We're definitely killing Perl 5's slice syntax, at least as far as relying on the initial character to determine the context of the subscript. There are many ways we could reintroduce a slicing syntax, some of which are mentioned in this RFC, but we'll defer the decision on that till Apocalypse 9 on Data Structures, since the interesting parts of designing slice syntax will be driven by the need to slice multidimensional arrays.

[Update: There is no Apocalypse 9, but there is a Synopsis 9 that covers these matters.]

For now we'll just say that arrays can have subscript signatures much like functions have parameter signatures. Ordinary one-dimensional arrays (and hashes) can then support some kind of simple slicing syntax that can be extended for more complicated arrays, while allowing multidimensional arrays to distinguish between simple slicing and complicated mappings of lists and functions onto subscripts in a manner more conducive to numerical programming.

On the subject of hash slices returning pairs rather than values, we could distinguish this with special slice syntax, or we could establish the notion of a hashlist context that tells the slice to return pairs rather than just values. (We may not need a special slice syntax for that if it's possible to typecast back and forth between pair lists and ordinary lists.)

[Update: Slicing to get a pairlist can be done by attaching a :p modifier to the subscript. In general though there's no such thing as a hashlist context. It's just that the list context supplied by assignment to a hash happens to know how to deal with pairs.]

RFC 196: More Direct Syntax for Hashes

This RFC makes three proposals, which we'll consider separately.

Proposal 1 is "that a hash in scalar context evaluate to the number of keys in the hash." (You can find that out now, but only by using the keys() function in scalar context.) Proposal 1 is OK if we change "scalar context" to "numeric context," since in scalar context a hash will produce a reference to the hash, which just happens to numify to the number of entries.

We must also realize that some implementations of hash might have to go through and count all the entries to return the actual number. Fortunately, in boolean context, it suffices to find a single entry to determine whether the hash contains anything. However, on hashes that don't keep track of the number of entries, finding even one entry might reset any active iterator on the hash, since some implementations of hash (in particular, the ones that don't keep track of the number of entries) may only supply a single iterator.

[Update: You may also call .elems to be more explicit.]

Proposal 2 is "that the iterator in a hash be reset through an explicit call to the reset() function." That's fine, with the proviso that it won't be a function, but rather a method on the HASH class.

[Update: all list contexts in Perl 6 are lazy by default, and different list contexts generate their own iterators, so all you have to do to "reset" and iterator is stop reading from the list in question.]

Proposal 3 is really about sort recognizing pairs and doing the right thing. Defaulting to sorting on $^a[0] cmp $^b[0] is likely to be reasonable, and that's where a pair's key would be found. However, it's probable that the correct solution is simply to provide a default string method for anonymous lists that happens to produce a decent key to sort on when cmp requests a string representation of either of its arguments. The sort itself should probably just concentrate on memoizing the returned strings so they don't have to be recalculated.

[Update: The sort interface has been completely revamped since this was written. This will eventually appear in S29, but as of now it's just in the perl6-language archives.]

RFC 201: Hash Slicing

This RFC proposes to use % as a marker for special hash slicing in the subscript. Unfortunately, the % funny character will not be available for this use, since all hash refs will start with %. Concise list comprehensions will require some other syntax within the subscript, which will hopefully generalize to arrays as well.

Other Decisions About Variables

Various special punctuation variables are gone in Perl 6, including all the deprecated ones. (Non-deprecated variables will be replaced by some kind of similar functionality that is likely to be invoked through some kind of method call on the appropriate object. If there is no appropriate object, then a named global variable might provide similar functionality.)

Freeing up the various bracketing characters allows us to use them for other purposes, such as interpolation of expressions:

    "$(expr)"           # interpolate a scalar expression
    "@(expr)"           # interpolate a list expression

[Update: Those forms mean something else now (casting). Expression interpolation is normally done via closure.]

$#foo is gone. If you want the final subscript of an array, and [-1] isn't good enough, use @foo.end instead.

Other special variables (such as the regex variables) will change from dynamic scoping to lexical scoping. It is likely that even $_ and @_ will be lexically scoped in Perl 6.

[Update: And indeed they are. But they happen to be a special kind of lexical variable called an "environment" variable, modeled on Unix environment variables. This allows subroutines to get at them and use them as defaults, in a pronominal sort of way.]

Names

In Perl 5, lexical scopes are unnamed and unnameable. In Perl 6, the current lexical scope will have a name that is visible within the lexical scope as the pseudo class MY, so that such a scope can, if it so chooses, delegate management of its lexical scope to some other module at compile time. In normal terms, that means that when you use a module, you can let it import things lexically as well as packagely.

[Update: The currently compiling lexical scope may also be named from anywhere as the COMPILING pseudopackage these days.]

Typeglobs are gone. Instead, you can get at a variable object through the symbol table hashes that are structured much like Perl 5's. The variable object for $MyPackage::foo is stored in:

    %MyPackage::{'$foo'}

Note that the funny character is part of the name. There is no longer any structure in Perl that associates everything with the name "foo".

[Update: The right way to say that now is "MyPackage::<$foo>". Hence the $foo variable in the scope currently being compiled is known as COMPILING::<$foo>.]

Perl's special global names are stored in a special package named "*" because they're logically in every scope that does not hide them. So the unambiguous name of the standard input filehandle is $*STDIN, but a package may just refer to $STDIN, and it will default to $*STDIN if no package or lexical variable of that name has been declared.

[Update: We did s/STD// on those, so standard input is now just $*IN.]

Some of these special variables may actually be cloned for each lexical scope or each thread, so just because a name is in the special global symbol table doesn't mean it always behaves as a global across all modules. In particular, changes to the symbol table that affect how the parser works must be lexically scoped. Just because I install a special rule for my cool new hyperquoting construct doesn't mean everyone else should have to put up with it. In the limiting case, just because I install a Python parser, it shouldn't force other modules into a maze of twisty little whitespace, all alike.

Another way to look at it is that all names in the "*" package are automatically exported to every package and/or outer lexical scope.

[Update: The names are no longer automatically exported, but you can import them from the global namespace via "use GLOBALS '$IN', '$OUT';" and such.]

Literals

Underscores in Numeric Literals

Underscores will be allowed between any two digits within a number.

RFC 105: Remove "In string @ must be \@" Fatal Error

Fine.

[Update: The interpolation rules for arrays have been completely revised. A bare array name no longer interpolates--you have to say @foo[].]

RFC 111: Here Docs Terminators (Was Whitespace and Here Docs)

Fine.

RFC 162: Heredoc contents

I think I like option (e) the best: remove whitespace equivalent to the terminator.

By default, if it has to dwim, it should dwim assuming that hard tabs are 8 spaces wide. This should not generally pose a problem, since most of the time the tabbing will be consistent throughout anyway, and no dwimming will be necessary. This puts the onus on people using nonstandard tabs to make sure they're consistent so that Perl doesn't have to guess.

Any additional mangling can easily be accomplished by a user-defined operator.

[Update: Here docs are now just a :to variant on extensible quotes, so any customization you can do to q/foo/ you can also do to q:to/END/.

RFC 139: Allow Calling Any Function With a Syntax Like s///

Creative quoting will be allowed with lexical mutataion, but we can't parse foo(bar) two different ways simultaneously, and I'm unwilling to prevent people from using parens as quote characters. I don't see how we can reasonably have new quote operators without explicit declaration. And if the utility of a quote-like operator is sufficient, there should be little relative burden in requiring such a declaration.

The form of such a declaration is left to the reader as an exercise in function property definition. We may revisit the question later in this series. It's also possible that a quote operator such as qx// could have a corresponding function name like quote:qx that could be invoked as a function.

RFC 222: Interpolation of Object Method Calls

I've been hankering for methods to interpolate for a long time, so I'm in favor of this RFC. And it'll become doubly important as we move toward encouraging people to use accessor methods to refer to object attributes outside the class itself.

I have one "but," however. Since we'll switch to using . instead of ->, I think for sanity's sake we may have to require the parentheses, or "$file.$ext" is going to give people fits. Not to mention "$file.ext".

[Update: Nowadays we also require brackets on array interpolations and braces on hash interpolations. See S03 for more.]

RFC 226: Selective Interpolation in Single Quotish Context.

This proposal has much going for it, but there are also difficulties, and I've come close to rejecting it outright simply because the single-quoting policy of Perl 5 has been successful. And I think the proposal in this RFC for \I...\E is ugly. (And I'd like to kill \E anyway, and use bracketed scopings.)

However, I think there is a major "can't get there from here" that we could solve by treating interpolation into single quotes as something hard, not something easy. The basic problem is that it's too easy to run into a \$ or \@ (or a \I for that matter) that wants to be taken literally. I think we could allow the interpolation of arbitrary expressions into single-quoted strings, but only if we limit it to an unlikely sequence where three or more characters are necessary for recognition. The most efficient mental model would seem to be the idea of embedding one kind of quote in another, so I think this:

    \q{stuff}

will embed single-quoted stuff, while this:

    \qq{stuff}

will embed double-quoted stuff. A variable could then be interpolated into a single-quoted string by saying:

    \qq{$foo}

RFC 237: Hashes Should Interpolate in Double-Quoted Strings

I agree with this RFC in principle, but we can't define the default hash stringifier in terms of variables that are going away in Perl 6, so the RFC's proposal of using $" is right out.

All objects should have a method by which they produce readable output. How this may be overridden by user preference is open to debate. Certainly, dynamic scoping has its problems. But lexical override of an object's preferences is also problematic. Individual object properties appear to give a decent way out of this. More on that below.

[Update: Hash values by default interpolate with tabs between key and value, and with newline between pairs. But you can give it a specific format with the .as method.]

On printf formats, I don't see any way to dwim that %d isn't an array, so we'll just have to put formats into single quotes in general. Those format strings that also interpolate variables will be able to use the new \qq{$var} feature.

[Update: Since hash interpolations require braces now, printf formats are safe again (unless they happen to be followed by curlies).]

Note for those who are thinking we should just stick with Perl 5 interpolation rules: We have to allow % to introduce interpolation now because individual hash values are no longer named with $foo{$bar}, but rather %foo{$bar}. So we might as well allow interpolation of complete hashes.

RFC 251: Interpolation of Class Method Calls

Class method calls are relatively rare (except for constructors, which will be rarely interpolated). So rather than scanning for identifiers that might introduce a class, I think we should just depend on expression interpolation instead:

    "There are $(Dog.numdogs) dogs."

[Update: That's now done with closure interpolation.]

RFC 252: Interpolation of Subroutines

I think subroutines should interpolate, provided they're introduced with the funny character. (On the other hand, how hard is $(sunset $date) or @(sunset $date)? On the gripping hand, I like the consistency of & with $, @ and %.)

I think the parens are required, since in Perl 6, scalar &sub will just return a reference, and require parens if you really want to deref the sub ref. (It's true that a subroutine can be called without parens when used as a list operator, but you can't interpolate those without a funny character.)

For those worried about the use of & for signature checking suppression, we should point out that & will no longer be the way to suppress signature checking in Perl 6, so it doesn't matter.

RFC 327: `\v` for Vertical Tab

I think the opportunity cost of not reserving \v for future use is too high to justify the small utility of retaining compatibility with a feature virtually nobody uses anymore. For instance, I almost used \v and \V for switching into and out of verbatim (single-quote) mode, until I decided to unify that with quoting syntax and use \qq{} and \q{} instead.

[Update: Turns out that \v matches vertical whitespace in patterns, which conveniently includes vertical tab--whatever that is... Also we now have \h for horizontal whitespace.]

RFC 328: Single quotes don't interpolate \' and \\

I think hyperquotes will be possible with a declaration of your quoting rules, so we're not going to change the basic single-quote rules (except for supporting \q).

[Update: There are adverbial modifiers now that can do hyperquoting. See S02.]

Other Decisions About Literals

Scoping of \L et al.

I'd like to get rid of the gratuitously ugly \E as an end-of-scope marker. Instead, if any sequence such as \L, \U or \Q wishes to impose a scope, then it must use curlies around that scope: \L{stuff}, \U{stuff} or \Q{stuff}. Any literal curlies contained in stuff must be backslashed. (Curlies as syntax (such as for subscripts) should nest correctly.)

[Update: Those constructs are now gone entirely. Use closure interpolation to interpolate the value of an expression.]

Bareword Policy

There will be no barewords in Perl 6. Any bare name that is a declared package name will be interpreted as a class object that happens to stringify to the package name. All other bare names will be interpreted as subroutine or method calls. For nonstrict applications, undefined subroutines will autodefine themselves to return their own name. Note that in ${name} and friends, the name is considered autoquoted, not a bareword.

[Update: The ${name} construct is gone. Use closure interpolation to disambiguate expression interpolations: "{$name}text". Use $($ref) or $$ref for hard dereferences. Use $::($name) for symbolic dereferences.]

Weird brackets

Use of brackets to disambiguate

    "${foo[bar]}"

from

    "${foo}[bar]"

will no longer be supported. Instead, the expression parser will always grab as much as it can, and you can make it quit at a particular point by interpolating a null string, specified by \Q:

    "$foo\Q[bar]"

[Update: That's gone too. Just use closure interpolation to disambiguate.]

Special tokens

Special tokens will turn into either POD directives or lexically scoped OO methods under the MY pseudo-package:

    Old                 New
    ---                 ---
    __LINE__            MY.line
    __FILE__            MY.file
    __PACKAGE__         MY.package
    __END__             =begin END      (or remove)
    __DATA__            =begin DATA

[Update: The first three are now $?LINE, $?FILE, and $?PACKAGE. There are other such variables too. See S02.]

Heredoc Syntax

I think heredocs will require quotes around any identifier, and we need to be sure to support << qq(END) style quotes. Space is now allowed before the (required) quoted token. Note that custom quoting is now possible, so if you define a fancy qh operator for your fancy hyperquoting algorithm, then you could say <<qh(END).

It is still the case that you can say <<"" to grab everything up to the next blank line. However, Perl 6 will consider any line containing only spaces, tabs, etc., to be blank, not just the ones that immediately terminate with newline.

[Update: q:to/END/ is now how you form a here doc.]

Context

In Perl 5, a lot of contextual processing was done at run-time, and even then, a given function could only discover whether it was in void, scalar or list context. In Perl 6, we will extend the notion of context to be more amenable to both compile-time and run-time analysis. In particular, a function or method can know (theoretically even at compile time) when it is being called in:

    Void context
    Scalar context

This file is part of the Perl 6 Archive