GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Association for Computing Machinery (ACM)  (3)
Material
Publisher
  • Association for Computing Machinery (ACM)  (3)
Language
Years
  • 1
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2018
    In:  Proceedings of the ACM on Measurement and Analysis of Computing Systems Vol. 2, No. 1 ( 2018-04-03), p. 1-29
    In: Proceedings of the ACM on Measurement and Analysis of Computing Systems, Association for Computing Machinery (ACM), Vol. 2, No. 1 ( 2018-04-03), p. 1-29
    Abstract: In modern datacenter networks (DCNs), failures of network devices are the norm rather than the exception, and many research efforts have focused on dealing with failures after they happen. In this paper, we take a different approach by predicting failures, thus the operators can intervene and "fix" the potential failures before they happen. Specifically, in our proposed system, named PreFix, we aim to determine during runtime whether a switch failure will happen in the near future. The prediction is based on the measurements of the current switch system status and historical switch hardware failure cases that have been carefully labelled by network operators. Our key observation is that failures of the same switch model share some common syslog patterns before failures occur, and we can apply machine learning methods to extract the common patterns for predicting switch failures. Our novel set of features (message template sequence, frequency, seasonality and surge) for machine learning can efficiently deal with the challenges of noises, sample imbalance, and computation overhead. We evaluated PreFix on a data set collected from 9397 switches (3 different switch models) deployed in more than 20 datacenters owned by a top global search engine in a 2-year period. PreFix achieved an average of 61.81% recall and 1.84 * 10^-5 false positive ratio. It outperforms the other failure prediction methods for computers and ISP devices.
    Type of Medium: Online Resource
    ISSN: 2476-1249
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2018
    detail.hit.zdb_id: 2924209-5
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2019
    In:  ACM SIGMETRICS Performance Evaluation Review Vol. 46, No. 1 ( 2019-01-17), p. 64-66
    In: ACM SIGMETRICS Performance Evaluation Review, Association for Computing Machinery (ACM), Vol. 46, No. 1 ( 2019-01-17), p. 64-66
    Type of Medium: Online Resource
    ISSN: 0163-5999
    URL: Issue
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2019
    detail.hit.zdb_id: 199353-7
    detail.hit.zdb_id: 2089001-1
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2019
    In:  ACM SIGMETRICS Performance Evaluation Review Vol. 46, No. 1 ( 2019-01-17), p. 64-66
    In: ACM SIGMETRICS Performance Evaluation Review, Association for Computing Machinery (ACM), Vol. 46, No. 1 ( 2019-01-17), p. 64-66
    Abstract: In modern datacenter networks (DCNs), failures of network devices are the norm rather than the exception, and many research efforts have focused on dealing with failures after they happen. In this paper, we take a different approach by predicting failures, thus the operators can intervene and "fix" the potential failures before they happen. Specifically, in our proposed system, named PreFix, we aim to determine during runtime whether a switch failure will happen in the near future. The prediction is based on the measurements of the current switch system status and historical switch hardware failure cases that have been carefully labelled by network operators. Our key observation is that failures of the same switch model share some common syslog patterns before failures occur, and we can apply machine learning methods to extract the common patterns for predicting switch failures. Our novel set of features (message template sequence, frequency, seasonality and surge) for machine learning can efficiently deal with the challenges of noises, sample imbalance, and computation overhead. We evaluated PreFix on a data set collected from 9397 switches (3 different switch models) deployed in more than 20 datacenters owned by a top global search engine in a 2-year period. PreFix achieved an average of 61.81% recall and 1.84x10 -5 false positive ratio, outperforming the other failure prediction methods for computers and ISP devices.
    Type of Medium: Online Resource
    ISSN: 0163-5999
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2019
    detail.hit.zdb_id: 199353-7
    detail.hit.zdb_id: 2089001-1
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...