GLORIA — GEOMAR Library Ocean Research Information Access

Hits per page

hits 1 - 3 | 3 hits

Sorting

Online Resource

Software plagiarism detection in multiprogramming languages using machine learning approach

Ullah, Farhan ; Wang, Junfeng ; Farhan, Muhammad ; [et al.]

Wiley ; 2021

In: Concurrency and Computation: Practice and Experience Vol. 33, No. 4 ( 2021-02-25)

add to mindlist on the mindlist

Details

In: Concurrency and Computation: Practice and Experience, Wiley, Vol. 33, No. 4 ( 2021-02-25)

Abstract: The Software plagiarism, which arises the problem of software piracy is a growing major concern nowadays. It is a serious risk to the software industry that gives huge economic damages every year. The customers may develop a modified version of the original software in other types of programming languages. Furthermore, the plagiarism detection in different types of source codes is a challenging task because each source code may have specific syntax rules. In this paper, we proposed a methodology for software plagiarism detection in multiprogramming languages based on machine learning approaches. The Principal Component Analysis (PCA) is applied for features extraction from source codes without losing the actual information. It extracts features by factor analysis and converts the dataset into normalized linear principal components which are further useful for predictions analysis. Then, the multinomial logistic regression model (MLR) is applied to these components to classify the source codes documents based on predictions. It gives the generalization of logistic regression to handle multiclass problems. Further, the predictors' performance in MLR is evaluated by 2 tailed z test. To apply the experiment, the dataset is collected in five different and popular languages, ie, C, C++, Java, C#, and Python. Each programming language taken in two different case studies, ie, binary search and Stack.

Type of Medium: Online Resource

ISSN: 1532-0626 , 1532-0634

URL: Issue

URL: Article

DOI: 10.1002/cpe.v33.4

DOI: 10.1002/cpe.5000

RVK:

SA 4270

Language: English

Publisher: Wiley

Publication Date: 2021

detail.hit.zdb_id: 2052606-4

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

A New Software Birthmark based on Weight Sequences of Dynamic Control Flow Graph for Plagiarism Detection

Yuan, Baoguo ; Wang, Junfeng ; Fang, Zhiyang ; [et al.]

Oxford University Press (OUP) ; 2018

In: The Computer Journal Vol. 61, No. 8 ( 2018-08-01), p. 1202-1215

add to mindlist on the mindlist

Details

In: The Computer Journal, Oxford University Press (OUP), Vol. 61, No. 8 ( 2018-08-01), p. 1202-1215

Type of Medium: Online Resource

ISSN: 0010-4620 , 1460-2067

URL: Article

DOI: 10.1093/comjnl/bxy055

RVK:

SQ 1100

RVK:

SA 3740

Language: English

Publisher: Oxford University Press (OUP)

Publication Date: 2018

detail.hit.zdb_id: 1477172-X

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

SeqMask: Behavior Extraction Over Cyber Threat Intelligence Via Multi-Instance Learning

Ge, Wenhan ; Wang, Junfeng

Oxford University Press (OUP) ; 2022

In: The Computer Journal ( 2022-11-29)

add to mindlist on the mindlist

Details

In: The Computer Journal, Oxford University Press (OUP), ( 2022-11-29)

Abstract: Identification and extraction of Tactics, Techniques and Procedures (TTPs) for Cyber Threat Intelligence (CTI) restore the full picture of cyber attacks and guide the analysts to assess the system risk. Existing frameworks can hardly provide uniform and complete processing mechanisms for TTPs information extraction without adequate knowledge background. A multi-instance learning approach named SeqMask is proposed in this paper as a solution. SeqMask extracts behavior keywords from CTI evaluated by the semantic impact, and predicts TTPs labels by conditional probabilities. Still, the framework has two mechanisms to determine the validity of keywords. One using expert experience verification. The other verifies the distortion of the classification effect by blocking existing keywords. In the experiments, SeqMask reached 86.07% and 73.99% in F1 scores for TTPs classifications. For the top 20% of keywords, the expert approval rating is 92.20%, where the average repetition of keywords whose scores between 100% and 90% is 60.02%. Particularly, when the top 65% of the keywords were blocked, the F1 decreased to about 50%; when removing the top 50%, the F1 was under 31%. Further, we also validate the possibility of extracting TTPs from full-size CTI and malware whose F1 are improved by 2.16% and 0.81%.

Type of Medium: Online Resource

ISSN: 0010-4620 , 1460-2067

URL: Article

DOI: 10.1093/comjnl/bxac172

RVK:

SQ 1100

RVK:

SA 3740

Language: English

Publisher: Oxford University Press (OUP)

Publication Date: 2022

detail.hit.zdb_id: 1477172-X

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

hits 1 - 3 | 3 hits