pizlonator 12 hours ago

My own perf comparison: when I switched from Fil-C running on my system’s libc (recent glibc) for yololand to my own build of musl, I got a 1-2% perf regression. My best guess is that it’s because glibc’s memcpy/memmove/memset are better. Couldn’t have been the allocator since Fil-C’s runtime has its own allocator.

  • skissane 7 hours ago

    What's Fil-C? Okay, found it myself, looks cool: https://github.com/pizlonator/llvm-project-deluge/

    What's yoyoland? All I can find is an amusement park in Bangkok, and some 1990s-era communication software for Classic Mac OS: https://www.macintoshrepository.org/39495-yoyo-2-1

    • pizlonator 6 hours ago

      The Fil-C stack is composed of :

      - Userland: the place where you C code lives. Like the normal userland you're familiar with, but everything is compiled with Fil-C, so it's memory safe.

      - Yololand: the place where Fil-C's runtime lives. Fil-C's runtime is about 100,000 lines of C code (almost entirely written by me), which currently has libc as a dependency (because the runtime makes syscalls using the normal C functions for syscalls rather than using assembly directly; also the runtime relies on a handful of libc utility functions that aren't syscalls, like memcpy).

      So Fil-C has two libc's. The yololand libc (compiled with a normal C compiler, only there to support the runtime) and the userland libc (compiled with the Fil-C compiler like everything else in Fil-C userland, and this is what your C code calls into).

      • skissane 5 hours ago

        Why does yoyoland need to use libc’s memcpy? Can’t you just use __builtin_memcpy?

        On Linux, if all you need is syscalls, you can just write your own syscall wrapper-like Go does.

        Doesn’t work on some other operating systems (e.g. Solaris/Illumos, OpenBSD, macOS, Windows) where the system call interface is private to the system shared libraries

        • pizlonator 3 hours ago

          > Why does yoyoland need to use libc’s memcpy? Can’t you just use __builtin_memcpy?

          Unless you do special things, the compiler turns __builtin_memcpy into a call to memcpy. :-)

          There is __builtin_memcpy_inline, but then you're at the compiler's whims. I don't think I want that.

          A faithful implementation of what you're proposing would have the Fil-C runtime provide a memcpy function so that whenever the compiler wants to call memcpy, it will call that function.

          > On Linux, if all you need is syscalls, you can just write your own syscall wrapper-like Go does.

          I could do that. I just don't, right now.

          You're totally right that I could remove the yolo libc. This is one of like 1,000 reasons why Fil-C is slower than it needs to be right now. It's a young project so it has lots of this kind of "expedient engineering".

  • abnercoimbre 10 hours ago

    Interesting! Will you stick around with the musl build? And if so, why?

    • pizlonator 8 hours ago

      Not sure but in likely to because right now I to use the same libc in userland (the Fil-C compiled part) and yololand (the part compiled by normal C that is below the runtime) and the userland libc is musl.

      Having them be the same means that if there is any libc function that is best implemented by having userland call a Fil-C runtime wrapper for the yololand implementation (say because what it’s doing requires platform specific assembly) then I can be sure that the yololand libc really implements that function the same way with all the same corner cases.

      But there aren’t many cases of that and they’re hacks that I might someday remove. So I probably won’t have this “libc sandwich” forever

  • LukeShu 7 hours ago

    When I was working with Envoy Proxy, it was known that perf was worse with musl than with glibc. We went through silly hoops to have a glibc Envoy running in an Alpine (musl) container.

ObscureScience 15 hours ago

That table is unfortunately quite old. I can't personally say what have changed, but it is hard to put much confidence in the relevance of the information.

  • lifthrasiir 8 hours ago

    Yeah, also it doesn't compare actual implementations, just plain checkboxes. I'm aware of two specific substantial performance regressions for musl: exact floating point printing (it uses Dragon4 but implemented it way slower than it could have been) and memory allocator (for a long time it didn't any sort of arena like pretty much every modern allocator---now it does with mallocng though).

thrtythreeforty 13 hours ago

It really ought to lead with the license of each library. I was considering dietlibc until I got to the bottom - GPLv2. I am a GPL apologist and even I can appreciate that this is a nonstarter; even GNU's libc is only LGPL!

  • LeFantome 12 hours ago

    musl seems to have displaced dietLibc. Much more complete yet fairly small and light.

    • yusina 10 hours ago

      Note that dietlibc is the project of a sole coder in the CCC sphere from Berlin (Fefe). His main objective was to learn how low level infra is implemented and started using it in some of his other projects after realizing that there is a lot of bloat he can skip with just implementing the bare essentials. Musl has a different set of objectives.

      • projektfu 6 hours ago

        I follow diet but it is definitely not ready for general use like musl and probably never will be. There aren't a lot of eyeballs on it.

        • yusina 17 minutes ago

          That's what I'm saying. It's not Fefe's objective to make it fit for everybody...

jay-barronville 13 hours ago

Please note that the linked comparison table has been unmaintained for a while. This is even explicitly stated on the legacy musl libc website[0][0] (i.e., “The (mostly unmaintained) libc comparison is still available on etalabs.net.”).

[0]: https://www.musl-libc.org

moomin 11 hours ago

No cosmopolitan, pity.

casey2 2 hours ago

Where is the "# of regressions caused" box?

snickerer 15 hours ago

Fun libc comparison by the author of musl.

My getaway is: glibc is bloated but fast. Quite unexpected combination. Am I right?

  • kstrauser 14 hours ago

    It’s not shocking. More complex implementations using more sophisticated algorithms can be faster. That’s not always true, but it often is. For example, look at some of the string search algorithms used by things like ripgrep. They’re way more complex than just looping across the input and matching character by character, and they pay off.

    Something like glibc has had decades to swap in complex, fast code for simple-looking functions.

    • weinzierl 14 hours ago

      In case of glibc I think what you said is orthogonal to its bloat. Yes, it has complex implementations but since they are for a good reason I'd hardly call them bloat.

      Independently from that glibc implements a lot of stuff that could be considered bloat:

      - Extensive internationalization support

      - Extensive backward compatibility

      - Support for numerous architectures and platforms

      - Comprehensive implementations of optional standards

      • kstrauser 14 hours ago

        Ok, fair points, although internationalization seems like a reasonable thing to include at first glance.

        Is there a fork of glibc that strips ancient or bizarre platforms?

        • SAI_Peregrinus 11 hours ago

          It's called glibc. Essentially all that "bloat" is conditionally compiled, if your target isn't an ancient or bizarre platform it won't get included in the runtime.

          • kstrauser 11 hours ago

            That’s mostly true, but not quite. For instance, suppose you aim to support all of 32/64-bit and little/big-endian. You’ll likely end up factoring straightforward math operations out into standalone functions. Granted, those will probably get inlined, but it may mean your structure is more abstracted than it would be otherwise. Just supporting the options has implications.

            That’s not the strongest example. I just meant it to be illustrative of the idea.

            • jcranmer 8 hours ago

              The way glibc's source works (for something like math functions) is that essentially every function is implemented in its own file, and various config knobs can provide extra directories to compile and provide function definitions. This can make actually finding the implementation that's going to be used difficult, since a naive search for the function name can turn up like 20 different function definitions, and working out which one is actually in play can be difficult (especially since it's more than just the architecture name).

              Math functions aren't going to be strongly impacted by diverse hardware support. In practice, you largely care about 32-bit and 64-bit IEEE 754 types, which means your macros to decompose floating-point types to their constituent sign/exponent/significand fields are already going to be pretty portable even across different endianness (just bitcast to a uint32_t/uint64_t, and all of the shift logic will remain the same). And there's not much reason to vary the implementation except to take advantage of hardware instructions that implement the math functions directly... which are generally better handled by the compiler anyways.

              • saagarjha 2 hours ago

                People don't typically implement math functions by pulling bits out of a reinterpreted floating point number. If you rely on the compiler, you get whatever it decides for you, which might be something dumb like float80.

        • dima55 12 hours ago

          What problem are you trying to solve? glibc works just fine for most use cases. If you have some niche requirements, you have alternative libraries you can use (listed in the article). Forking glibc in the way you describe is literally pointless

          • kstrauser 11 hours ago

            Nothing really. I was just curious and this isn’t something I know much about, but would like to learn more of.

  • LeFantome 12 hours ago

    A lot of the “slowness” of MUSL is the default allocator. It can be swapped out.

    For example, Chimera Linux uses MUSL with mimalloc and it is quite snappy.

    • jeffbee 7 hours ago

      That's a great combo. I like LLVM libc in overlay mode with musl beneath and mimalloc. Performance is excellent.

  • timeinput 14 hours ago

    My take away is that it's not a meaningful chart? Just in the first row musl looks bloated at 426k compared to dietlibc at 120k. Why were those colors chosen? It's arbitrary and up to the author of the chart.

    The author of musl made a chart, that focused on the things they cared about and benchmarked them, and found that for the things they prioritized they were better than other standard library implementations (at least from counting green rows)? neat.

    I mean I'm glad they made the library, that it's useful, and that it's meeting the goals they set out to solve, but what would the same chart created by the other library authors look like?

  • cyberax 10 hours ago

    Not quite correct. glibc is slow if you need to be able to fork quickly.

    However, it does have super-optimized string/memory functions. There are highly optimized assembly language implementations of them that use SIMD for dozens of different CPUs.

edam 12 hours ago

Pretty obviously made by the musl authors.

  • deaddodo 10 hours ago

    > "I have tried to be fair and objective, but as I am the author of musl"

    Yeah, pretty obvious when they state as much in the first paragraph.