# NAME

Sys::Binmode - Fix Perl�䏭 system call character encoding.

<div>
    <a href='https://coveralls.io/github/FGasper/p5-Sys-Binmode?branch=master'><img src='https://coveralls.io/repos/github/FGasper/p5-Sys-Binmode/badge.svg?branch=master' alt='Coverage Status' /></a>
</div>

# SYNOPSIS

    use Sys::Binmode;

    my $foo = "矇";
    $foo .= "\x{100}";
    chop $foo;

    # Prints �𧟌抽��:
    print $foo, $/;

    # In Perl 5.32 this may print mojibake,
    # but with Sys::Binmode it always prints �𧟌抽��:
    exec 'echo', $foo;

# DESCRIPTION

tl;dr: Use this module in **all** new code.

# BACKGROUND

Ideally, a Perl application doesn�脌 need to know how the interpreter stores
a given string internally. Perl can thus store any Unicode code point while
still optimizing for size and speed when storing �營ytes-compatible��
strings�㻳.e., strings whose code points all lie below 256. Perl�䏭
�㙟ptimized�� string storage format is faster and less memory-hungry, but it
can only store code points 0-255. The �崬noptimized�� format, on the other
hand, can store any Unicode code point.

Of course, Perl doesn�脌 _always_ optimize �營ytes-compatible�� strings;
Perl can also, if
it wants, store such strings �崬noptimized�� (i.e., in Perl�䏭 internal
�𦧺oose UTF-8�� format), too. For code points 0-127 there�䏭 actually no
difference between the two forms, but for 128-255 the formats differ. (cf.
["The "Unicode Bug"" in perlunicode](https://metacpan.org/pod/perlunicode#The-Unicode-Bug)) This means that anything that reads
Perl�䏭 internals **MUST** differentiate between the two forms in order to
use the string correctly.

Alas, that differentiation doesn�脌 always happen. Thus, Perl can
output a string that stores one or more 128-255 code points
differently depending on whether Perl has �㙟ptimized�� that string or not.

Remember, though: Perl applications _should_ _not_ _care_ about
Perl�䏭 string storage internals. (This is why, for example, the [bytes](https://metacpan.org/pod/bytes)
pragma is discouraged.) The catch, though, is that without that knowledge,
**the** **application** **can�脌** **know** **what** **it** **actually** **says**
**to** **the** **outside** **world!**

Thus, applications must either monitor Perl�䏭 string-storage internals
or accept unpredictable behaviour, both of which are categorically bad.

# HOW THIS MODULE (PARTLY) FIXES THE PROBLEM

This module provides predictable behaviour for Perl�䏭 built-in functions by
downgrading all strings before giving them to the operating system. It�䏭
equivalent to�𥿡ut faster than!�䕑refixing your system calls with
`utf8::downgrade()` (cf. [utf8](https://metacpan.org/pod/utf8)) on all arguments.

Predictable behaviour is **always** a good thing; ergo, you should
use this module in **all** new code.

# CAVEAT: CHARACTER ENCODING

If you apply this module injudiciously to existing code you may see
exceptions thrown where previously things worked just fine. This can
happen if you�脎e neglected to encode one or more strings before
sending them to the OS; if Perl has such a string stored upgraded then
Perl will, under default behaviour, send a UTF-8-encoded
version of that string to the OS. In essence, it�䏭 an implicit
UTF-8 auto-encode.

The fix is to apply an explicit UTF-8 encode prior to the system call
that throws the error. This is what we should do _anyway_;
Sys::Binmode just enforces that better.

## Windows (et alia)

NTFS, Windows�䏭 primary filesystem, expects filenames to be encoded in
little-endian UTF-16. To create a file named `矇p矇e`, then, on NTFS
you have to do something like:

    my $windows_filename = Encode::Simple::encode( 'UTF-16LE', $filename );

�� where `$filename` is a character (i.e., decoded) string.

Other OSes and filesystems may have their own quirks; regardless, this
module gives you a saner point of departure to address those
than Perl�䏭 default behaviour provides.

# WHERE ELSE THIS PROBLEM CAN APPEAR

The unpredictable-behaviour problem that this module fixes in core Perl is
also common in XS modules due to rampant
use of [the SvPV macro](https://perldoc.perl.org/perlapi#SvPV) and
variants. SvPV is like the [bytes](https://metacpan.org/pod/bytes) pragma in C: it gives you the string�䏭
internal bytes with no regard for what those bytes represent. XS authors
_generally_ should prefer
[SvPVbyte](https://perldoc.perl.org/perlapi#SvPVbyte)
or [SvPVutf8](https://perldoc.perl.org/perlapi#SvPVutf8) in lieu of
SvPV unless the C code in question deals with Perl�䏭 encoding abstraction.

Note in particular that, as of Perl 5.32, the default XS typemap converts
scalars to C `char *` and `const char *` via an SvPV variant. This means
that any module that uses that conversion logic also has this problem.
So XS authors should also avoid the default typemap for such conversions.

# LEXICAL SCOPING

If, for some reason, you _want_ Perl�䏭 unpredictable default behaviour,
you can disable this module for a given block via
`no Sys::Binmode`, thus:

    use Sys::Binmode;

    system 'echo', $foo;        # predictable/sane/happy

    {

        # You should probably explain here why you�胩e doing this.
        no Sys::Binmode;

        system 'echo', $foo;    # nasal demons
    }

# AFFECTED BUILT-INS

- `exec` and `system`
- `do` and `require`
- File tests (e.g., `-e`) and the following:
`chdir`, `chmod`, `chown`, `chroot`,
`link`, `lstat`, `mkdir`, `open`, `opendir`, `readlink`, `rename`,
`rmdir`, `stat`, `symlink`, `sysopen`, `truncate`,
`unlink`, `utime`
- `bind`, `connect`, and `setsockopt`
- `syscall`

# TODO

- `dbmopen` and the System V IPC functions aren�脌 covered here.
If you�羮 like them, ask.
- There�䏭 room for optimization, if that�䏭 gainful.
- Ideally this behaviour should be in Perl�䏭 core distribution.
- Even more ideally, Perl should adopt this behaviour as _default_.
Maybe someday!

# ACKNOWLEDGEMENTS

Thanks to Leon Timmermans (LEONT) and Paul Evans (PEVANS) for some
debugging and design help.

# LICENSE & COPYRIGHT

Copyright 2021 Gasper Software Consulting. All rights reserved.

This library is licensed under the same license as Perl.