Software Repositories and Machine Learning Research in Cyber Security: Discussio...

Software Repositories and Machine Learning Research in Cyber Security: Discussions

February 8th 2024 New Story

2min

by @escholar

EScholar: Electronic Academic Papers for Scholars

@escholar

We publish the best academic work (that's too often lost...

Read this story in a terminal

Print this story

Read this story w/o Javascript

Too Long; Didn't Read

People Mentioned

Machine Learning

@machinelearning2

Your browser does not support theaudio element.

Read by Dr. One (en-US)

Audio Presented by

@escholar

EScholar: Electronic Academic Papers for Scholars

We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Receive Stories from @escholar

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mounika Vanamala, Department of Computer Science, University of Wisconsin-Eau Claire, United States;

(2) Keith Bryant, Department of Computer Science, University of Wisconsin-Eau Claire, United States;

(3) Alex Caravella, Department of Computer Science, University of Wisconsin-Eau Claire, United States.

Table of Links

Abstract & Introduction

Discussions

Conclusions, Acknowledgment, and References

Discussions

Semantics of words have a crucial role in properly categorizing words through ML. Two different words can be processed into the same word, which potentially provides inaccurate classification. One example is the preprocessing of the words desert and deserted, these words both become desert. The meaning of the word deserted is lost. It would be essential for an ML model to be effective in semantic analysis if it were to make recommendations upon relevant vulnerabilities, utilizing the CAPEC database. The next discussion is the consideration of implementing an unsupervised, supervised, or semi-supervised ML model. The goal of this research would be to compare keywords from an SRS document to the keywords of CAPEC vulnerabilities.

Unsupervised Machine Learning (ML) algorithms find their primary utility in tasks involving the segregation of data into clusters, uncovering underlying data relationships, and reducing dimensionality. For instance, dimensionality reduction becomes a valuable tool when dealing with extensive datasets like the large CAPEC dataset, as it aims to streamline data while preserving its integrity.

In the realm of text analysis, research on Latent Dirichlet Allocation (LDA) uncovered substantial adaptation requirements to achieve satisfactory outcomes, largely due to semantic limitations. On the other hand, the Latent Semantic Analysis (LSA) algorithm is designed to capture semantics and establish connections between vectors that words are segmented into. Over time, LSA has frequently been coupled with techniques such as Singular Value Decomposition (SVD) or other intricate algorithms to enhance its effectiveness. It's important to note that evaluating the usefulness of unsupervised methods, in general, can be challenging due to the absence of well-defined metrics to measure model accuracy. This lack of clear metrics adds complexity to the interpretation of results, making it more intricate to discern the quality of outcomes generated by unsupervised ML approaches.

Supervised ML is a less complex process and requires fewer tools than unsupervised ML (IBM, 2019). Supervised ML uses a training dataset and validation techniques to derive accurate results in a timelier manner, compared to unsupervised ML. Unsupervised ML works by clustering objects into like groups, identified by the algorithm. The largest limitation for supervised ML requires obtaining the training data set to prep the implemented algorithm. Supervised ML also is significantly more proficient at obtaining metrics for the accuracy of results.

Software Repositories and Machine Learning Research in Cyber Security: Discussions

EScholar: Electronic Academic Papers for Scholars

@escholar

Too Long; Didn't Read

People Mentioned

@escholar

Table of Links

Discussions

Recommend

13 Side Hustles That Can Make You Money Online Explained

戴森推出带有新型加热元件和 RFID 附件的 Supersonic r

Redefining Software Engineer Levels: A Transparent Classification System for You...

Innovations in depth from focus/defocus pave the way to more capable computer vi...

Disparate Impact: Who's Afraid of It?

Virgin Media probed over digital switchover rules

Tech Team Offboarding: Should You Have a Process in Place?

Dissecting Video Games and If They Are Art: A Non-Gamer's Analysis

Open Banking: Key Trends for 2024

Unilever Boosts Stock Buyback Program

About Joyk