Vector Similarity Save

Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"

Project README

Vector_Similarity

Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
Also, I have summarized "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
I recommend TS-SS instead of Cosine distance or Euclidean distance.

The reasons are...

Cosine drawbacks

coise_drawback

Euclidean drawbacks

euclidean drawback

Triangle's Area Similarity (TS)

Sector's Area Similarity (SS)

TS-SS

TS_SS

Results

results

Conclusion

In biggest dataset, TS-SS outperforms Cosine with a significant difference, while in other datasets TS-SS outperforms Cosine slightly
Therefore, the significant better result of TS-SS in biggest dataset justifies the robustness and reliability of the model for big data and real world data where the variety of documents/texts are high

Reference

[1] A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering [link1] [link2] [View Article]

Open Source Agenda is not affiliated with "Vector Similarity" Project. README Source: taki0112/Vector_Similarity

Stars

293

Open Issues

Last Commit

4 years ago

Repository

taki0112/Vector_Similarity

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/vector-similarity"><img src="https://www.opensourceagenda.com/projects/vector-similarity/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022