Author: haopengzhan

  • How do we know if Kubelet leaks Inotify watchers

    How do we know if Kubelet leaks Inotify watchers

    Kubelet as the node agent of Kubernetes OSS, always needs to monitor paths. Using Inotify to do so, Kubelet exposes to possibility of leaking of Inotify watchers. In recent, I observed a case where the Kubelet was hung for enormous Inotify usage. This Post briefly discussed how I debug the process and locate the problem.…

  • Practical debugging methods for Kubelet

    Practical debugging methods for Kubelet

    Kubelet, a vital component in Kubernetes, runs on each node in your cluster. It acts as the field manager, receiving instructions from the Kubernetes API server and ensuring containerized applications run smoothly. Kubelet is responsible for downloading container images, pulling secrets, and launching pods – the basic units containing your application containers. It also monitors…

  • Selected Labs for CS350 courses in Binghamton University

    Selected Labs for CS350 courses in Binghamton University

    TL; DR. This will be a series regarding labs I gave during the spring 2022 semester. The reason why I am writing this down is that it has been a week and no students have asked for the solution for the last Lab. I realize that the learning gap between students is huge, especially when…

  • EDDL: How do we train neural networks on limited edge devices – PART 2

    EDDL: How do we train neural networks on limited edge devices – PART 2

    In the last post, part1, our idea of distributed learning on edge environment was generally addressed.I introduced the reason why edge distributed learning is needed and what improvements it can achieve.In this post, I will talk about our motivation study and how our framework works.

  • EDDL: How do we train neural networks on limited edge devices – PART 1

    EDDL: How do we train neural networks on limited edge devices – PART 1

    This post introduces our previous milestone in the project named “Edge Trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published. As the first part of the introductions, I focus only on the motivation and summary of our works. More details on design and implementation can be found…