# NAME Sys::Binmode - Fix Perl�䏭 system call character encoding. <div> <a href='https://coveralls.io/github/FGasper/p5-Sys-Binmode?branch=master'><img src='https://coveralls.io/repos/github/FGasper/p5-Sys-Binmode/badge.svg?branch=master' alt='Coverage Status' /></a> </div> # SYNOPSIS use Sys::Binmode; my $foo = "矇"; $foo .= "\x{100}"; chop $foo; # Prints �𧟌抽��: print $foo, $/; # In Perl 5.32 this may print mojibake, # but with Sys::Binmode it always prints �𧟌抽��: exec 'echo', $foo; # DESCRIPTION tl;dr: Use this module in **all** new code. # BACKGROUND Ideally, a Perl application doesn�脌 need to know how the interpreter stores a given string internally. Perl can thus store any Unicode code point while still optimizing for size and speed when storing �營ytes-compatible�� strings�㻳.e., strings whose code points all lie below 256. Perl�䏭 �㙟ptimized�� string storage format is faster and less memory-hungry, but it can only store code points 0-255. The �崬noptimized�� format, on the other hand, can store any Unicode code point. Of course, Perl doesn�脌 _always_ optimize �營ytes-compatible�� strings; Perl can also, if it wants, store such strings �崬noptimized�� (i.e., in Perl�䏭 internal �𦧺oose UTF-8�� format), too. For code points 0-127 there�䏭 actually no difference between the two forms, but for 128-255 the formats differ. (cf. ["The "Unicode Bug"" in perlunicode](https://metacpan.org/pod/perlunicode#The-Unicode-Bug)) This means that anything that reads Perl�䏭 internals **MUST** differentiate between the two forms in order to use the string correctly. Alas, that differentiation doesn�脌 always happen. Thus, Perl can output a string that stores one or more 128-255 code points differently depending on whether Perl has �㙟ptimized�� that string or not. Remember, though: Perl applications _should_ _not_ _care_ about Perl�䏭 string storage internals. (This is why, for example, the [bytes](https://metacpan.org/pod/bytes) pragma is discouraged.) The catch, though, is that without that knowledge, **the** **application** **can�脌** **know** **what** **it** **actually** **says** **to** **the** **outside** **world!** Thus, applications must either monitor Perl�䏭 string-storage internals or accept unpredictable behaviour, both of which are categorically bad. # HOW THIS MODULE (PARTLY) FIXES THE PROBLEM This module provides predictable behaviour for Perl�䏭 built-in functions by downgrading all strings before giving them to the operating system. It�䏭 equivalent to�𥿡ut faster than!�䕑refixing your system calls with `utf8::downgrade()` (cf. [utf8](https://metacpan.org/pod/utf8)) on all arguments. Predictable behaviour is **always** a good thing; ergo, you should use this module in **all** new code. # CAVEAT: CHARACTER ENCODING If you apply this module injudiciously to existing code you may see exceptions thrown where previously things worked just fine. This can happen if you�脎e neglected to encode one or more strings before sending them to the OS; if Perl has such a string stored upgraded then Perl will, under default behaviour, send a UTF-8-encoded version of that string to the OS. In essence, it�䏭 an implicit UTF-8 auto-encode. The fix is to apply an explicit UTF-8 encode prior to the system call that throws the error. This is what we should do _anyway_; Sys::Binmode just enforces that better. ## Windows (et alia) NTFS, Windows�䏭 primary filesystem, expects filenames to be encoded in little-endian UTF-16. To create a file named `矇p矇e`, then, on NTFS you have to do something like: my $windows_filename = Encode::Simple::encode( 'UTF-16LE', $filename ); �� where `$filename` is a character (i.e., decoded) string. Other OSes and filesystems may have their own quirks; regardless, this module gives you a saner point of departure to address those than Perl�䏭 default behaviour provides. # WHERE ELSE THIS PROBLEM CAN APPEAR The unpredictable-behaviour problem that this module fixes in core Perl is also common in XS modules due to rampant use of [the SvPV macro](https://perldoc.perl.org/perlapi#SvPV) and variants. SvPV is like the [bytes](https://metacpan.org/pod/bytes) pragma in C: it gives you the string�䏭 internal bytes with no regard for what those bytes represent. XS authors _generally_ should prefer [SvPVbyte](https://perldoc.perl.org/perlapi#SvPVbyte) or [SvPVutf8](https://perldoc.perl.org/perlapi#SvPVutf8) in lieu of SvPV unless the C code in question deals with Perl�䏭 encoding abstraction. Note in particular that, as of Perl 5.32, the default XS typemap converts scalars to C `char *` and `const char *` via an SvPV variant. This means that any module that uses that conversion logic also has this problem. So XS authors should also avoid the default typemap for such conversions. # LEXICAL SCOPING If, for some reason, you _want_ Perl�䏭 unpredictable default behaviour, you can disable this module for a given block via `no Sys::Binmode`, thus: use Sys::Binmode; system 'echo', $foo; # predictable/sane/happy { # You should probably explain here why you�胩e doing this. no Sys::Binmode; system 'echo', $foo; # nasal demons } # AFFECTED BUILT-INS - `exec` and `system` - `do` and `require` - File tests (e.g., `-e`) and the following: `chdir`, `chmod`, `chown`, `chroot`, `link`, `lstat`, `mkdir`, `open`, `opendir`, `readlink`, `rename`, `rmdir`, `stat`, `symlink`, `sysopen`, `truncate`, `unlink`, `utime` - `bind`, `connect`, and `setsockopt` - `syscall` # TODO - `dbmopen` and the System V IPC functions aren�脌 covered here. If you�羮 like them, ask. - There�䏭 room for optimization, if that�䏭 gainful. - Ideally this behaviour should be in Perl�䏭 core distribution. - Even more ideally, Perl should adopt this behaviour as _default_. Maybe someday! # ACKNOWLEDGEMENTS Thanks to Leon Timmermans (LEONT) and Paul Evans (PEVANS) for some debugging and design help. # LICENSE & COPYRIGHT Copyright 2021 Gasper Software Consulting. All rights reserved. This library is licensed under the same license as Perl.