CHEN Hongxi,QIU Xiaobin,QU Yang,et al.Design and implementation of observability system based on event storage engine[J].Journal on Communications,2024,45(Z2):177-185.
CHEN Hongxi,QIU Xiaobin,QU Yang,et al.Design and implementation of observability system based on event storage engine[J].Journal on Communications,2024,45(Z2):177-185. DOI: 10.11959/j.issn.1000-436x.2024256.
Design and implementation of observability system based on event storage engine
To solve the challenges posed by the significant increase in software system complexity due to the drastic changes in software architecture and the widespread adoption of new technologies over the past decade—challenges that have led to a surge in software bugs and difficulties in system failure troubleshooting
a design and implementation method for a more flexible and efficient monitoring system tailored to traditional monolithic service-based monitoring systems was proposed. The method abstracted traditional monitoring data sources into a unified event model and designed a corresponding storage engine that offered a unified query and write API. Based on the event storage engine
an observability system was constructed
providing richer and more powerful querying and analysis capabilities. The final application results demonstrate that
compared to traditional monitoring systems
the observability system developed in this paper significantly enhances efficiency in troubleshooting and problem analysis.
关键词
Keywords
references
ZHANG Z H , ZHAN J F , LI Y , et al . Precise request tracing and performance debugging for multi-tier services of black boxes [C ] // Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks . Piscataway : IEEE Press , 2009 : 337 - 346 .
LAI C A , KIMBALL J , ZHU T , et al . Milliscope: a fine-grained monitoring framework for performance debugging of n-tier web services [C ] // Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) . Piscataway : IEEE Press , 2017 : 92 - 102 .
MI H B , WANG H M , ZHOU Y F , et al . Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems [C ] // Proceedings of the IEEE Transactions on Parallel and Distributed Systems . Piscataway : IEEE Press , 2013 : 1245 - 1255 .
SRIDHARAN C . Distributed systems observability : a guide to building robust systems [M ] . Sebastopol : O'Reilly , 2018 .
BECKETT D . Combined log system [J ] . Computer Networks and ISDN Systems , 1995 , 27 ( 6 ): 1089 - 1096 .
HE P J , ZHU J M , HE S L , et al . Towards automated log parsing for large-scale log data analysis [J ] . IEEE Transactions on Dependable and Secure Computing , 2018 , 15 ( 6 ): 931 - 944 .
ZHAO X , RODRIGUES K , LUO Y , et al . Log20: fully automated optimal placement of log printing statements under specified overhead threshold [C ] // Proceedings of the 26th Symposium on Operating Systems Principles . New York : ACM Press , 2017 : 565 - 581 .
DUMAIS S , JEFFRIES R , RUSSELL D M , et al . Understanding user behavior through log data and analysis [M ] //OLSON J S, KELLOGG W A, eds. Ways of Knowing in HCI . New York, NY : Springer New York , 2014 : 349 - 372 .
LANDAUER M , SKOPIK F , WURZENBERGER M , et al . System log clustering approaches for cyber security applications: a survey [J ] . Computers & Security , 2020 , 92 : 101739 .
SHEKHTMAN L , WAISBARD E . EngraveChain: a blockchain-based tamper-proof distributed log system [J ] . Future Internet , 2021 , 13 ( 6 ): 143 .
ZHENG B , WANG Y T , YAN Y , et al . Management of time series database: technology, system and prospect [J ] . Industrial Technology Innovation , 2022 , 9 ( 4 ): 12 - 21 .
YANG Y , WANG L , GU J , et al . Transparently capturing execution path of service/job request processing[M . Cham : Springer International Publishing , 2018 : 879 - 887 .
THERESKA E , SALMON B , STRUNK J , et al . Stardust [J ] . ACM SIGMETRICS Performance Evaluation Review , 2006 , 34 ( 1 ): 3 - 14 .