GNU — Cherie's Tech Blog

Mar 30, 2022 5 min read

GCC malloc + memset Optimization

最近踩到 compiler optimization 的一個小坑，gcc5 版本以上，開啟 optimization level 2 時，會將 malloc + memeset 轉換成 calloc。看起來似乎蠻有道理的改善，卻因為我在進行客製化 malloc 時，沒有這注意到這點，導致最後程式執行的是 libc 的 calloc 而不是我的 wrap_malloc。

起因

使用 void *__wrap_malloc(size_t size) 將 malloc function 進行額外的處理 (How to wrap a system call (libc function) in Linux)，其中一段 code 為:

Aug 17, 2019 4 min read

semaphore sem_post 在 glibc v2.0 v2.1 之比較

前言

在翻 futex man document 的時候，不小心看到 Linux Futex的设计与实现這篇文章。文章中有提到在執行 sem_post 的時候，雖然沒有與其他 thread 競爭，還是會用到 fuxtex system call。當然文章中有提到原因，不過看了原因，覺得這看起來很明顯的效能問題，應該會被提出來並改進吧？後來翻了一下目前的 source code ，看到這個是 GLIBC_2_0 的版本（那篇文章也很久了，所以能想見當時的版本應該蠻舊的），而現在 GLIBC_2_1 就有把這部分改進，因此就來簡單記錄一下差異在哪裡吧～

Version

glibc 2.29

GLIBC_2_0 version

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int attribute_compat_text_section __old_sem_post (sem_t *sem)
{
  int *futex = (int *) sem;
  atomic_write_barrier ();
  (void) atomic_increment_val (futex);
  int err = lll_futex_wake(futex, 1, LLL_SHARED);
  if (__builtin_expect (err, 0) < 0)
    {
      __set_errno (-err);
      return -1;
    }
  return 0;
}

從 source code 可以看到，在將 sem value + 1 後，就直接使用 lll_futex_wake 呼叫 futex wake system call，其原因在文章內也有提到，是因為當 sem value 加 1 後，當時的機制並無法確認是否有其他 thread 正在 wait。