3

Software Repositories and Machine Learning Research in Cyber Security: Discussio...

 3 months ago
source link: https://hackernoon.com/software-repositories-and-machine-learning-research-in-cyber-security-discussions
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Software Repositories and Machine Learning Research in Cyber Security: Discussions

Software Repositories and Machine Learning Research in Cyber Security: Discussions

February 8th 2024 New Story
2min
by @escholar

EScholar: Electronic Academic Papers for Scholars

@escholar

We publish the best academic work (that's too often lost...

Read this story in a terminal
Print this story
Read this story w/o Javascript

Too Long; Didn't Read


People Mentioned

Machine Learning

@machinelearning2

featured image - Software Repositories and Machine Learning Research in Cyber Security: Discussions
Your browser does not support theaudio element.
Read by Dr. One (en-US)
Audio Presented by

@escholar

EScholar: Electronic Academic Papers for Scholars

We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community


Receive Stories from @escholar

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mounika Vanamala, Department of Computer Science, University of Wisconsin-Eau Claire, United States;

(2) Keith Bryant, Department of Computer Science, University of Wisconsin-Eau Claire, United States;

(3) Alex Caravella, Department of Computer Science, University of Wisconsin-Eau Claire, United States.

Table of Links

Abstract & Introduction

Discussions

Conclusions, Acknowledgment, and References

Discussions

Semantics of words have a crucial role in properly categorizing words through ML. Two different words can be processed into the same word, which potentially provides inaccurate classification. One example is the preprocessing of the words desert and deserted, these words both become desert. The meaning of the word deserted is lost. It would be essential for an ML model to be effective in semantic analysis if it were to make recommendations upon relevant vulnerabilities, utilizing the CAPEC database. The next discussion is the consideration of implementing an unsupervised, supervised, or semi-supervised ML model. The goal of this research would be to compare keywords from an SRS document to the keywords of CAPEC vulnerabilities.

Unsupervised Machine Learning (ML) algorithms find their primary utility in tasks involving the segregation of data into clusters, uncovering underlying data relationships, and reducing dimensionality. For instance, dimensionality reduction becomes a valuable tool when dealing with extensive datasets like the large CAPEC dataset, as it aims to streamline data while preserving its integrity.

In the realm of text analysis, research on Latent Dirichlet Allocation (LDA) uncovered substantial adaptation requirements to achieve satisfactory outcomes, largely due to semantic limitations. On the other hand, the Latent Semantic Analysis (LSA) algorithm is designed to capture semantics and establish connections between vectors that words are segmented into. Over time, LSA has frequently been coupled with techniques such as Singular Value Decomposition (SVD) or other intricate algorithms to enhance its effectiveness. It's important to note that evaluating the usefulness of unsupervised methods, in general, can be challenging due to the absence of well-defined metrics to measure model accuracy. This lack of clear metrics adds complexity to the interpretation of results, making it more intricate to discern the quality of outcomes generated by unsupervised ML approaches.

Supervised ML is a less complex process and requires fewer tools than unsupervised ML (IBM, 2019). Supervised ML uses a training dataset and validation techniques to derive accurate results in a timelier manner, compared to unsupervised ML. Unsupervised ML works by clustering objects into like groups, identified by the algorithm. The largest limitation for supervised ML requires obtaining the training data set to prep the implemented algorithm. Supervised ML also is significantly more proficient at obtaining metrics for the accuracy of results.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK