Gensim Data Versions Save

Data repository for pretrained NLP models and NLP corpora.

glove-twitter-200

6 years ago

Pre-trained glove vectors based on 2B tweets, 27B tokens, 1.2M vocab, uncased.

attribute value
File size 759MB
Number of vectors 1193514
Dimension 200
License http://opendatacommons.org/licenses/pddl/

Read more:

Example:

import gensim.downloader as api

model = api.load("glove-twitter-200")
model.most_similar(positive=['king', 'woman'],negative=['man'], topn=1)

"""
Output:

[('queen', 0.6820898056030273)]
"""

__testing_word2vec-matrix-synopsis

6 years ago

:exclamation: For testing purposes only :exclamation: This a word2vec model of matrix-synopsis.

__testing_matrix-synopsis

6 years ago

:exclamation: For testing purposes only :exclamation:

Source : matrix-synopsis

20-newsgroups

6 years ago

The notorious collection of newsgroup posts partitioned (nearly) evenly across 20 different newsgroups.

attribute value
File size 14MB
Number of posts 18846

Read more:

Example

import gensim.downloader as api
import json

newsgroups_dataset = api.load("20-newsgroups")
for doc in newsgroups_dataset:
	print(json.dumps(doc, indent=4))
	break

"""
Output:
{
    "set": "train",
    "data": "From: [email protected] (D. Andrew Byler)\nSubject: Re: Serbian genocide Work of God?\nOrganization: Freshman, Civil Engineering, Carnegie Mellon, Pittsburgh, PA\nLines: 61\n\nVera Shanti Noyes writes;\n\n>this is what indicates to me that you may believe in predestination.\n>am i correct?  i do not believe in predestination -- i believe we all\n>choose whether or not we will accept God's gift of salvation to us.\n>again, fundamental difference which can't really be resolved.\n\nOf course I believe in Predestination.  It's a very biblical doctrine as\nRomans 8.28-30 shows (among other passages).  Furthermore, the Church\nhas always taught predestination, from the very beginning.  But to say\nthat I believe in Predestination does not mean I do not believe in free\nwill.  Men freely choose the course of their life, which is also\naffected by the grace of God.  However, unlike the Calvinists and\nJansenists, I hold that grace is resistable, otherwise you end up with\nthe idiocy of denying the universal saving will of God (1 Timothy 2.4). \nFor God must give enough grace to all to be saved.  But only the elect,\nwho he foreknew, are predestined and receive the grace of final\nperserverance, which guarantees heaven.  This does not mean that those\nwithout that grace can't be saved, it just means that god foreknew their\nobstinacy and chose not to give it to them, knowing they would not need\nit, as they had freely chosen hell.\n\t\t\t\t\t\t\t  ^^^^^^^^^^^\nPeople who are saved are saved by the grace of God, and not by their own\neffort, for it was God who disposed them to Himself, and predestined\nthem to become saints.  But those who perish in everlasting fire perish\nbecause they hardened their heart and chose to perish.  Thus, they were\ndeserving of God;s punishment, as they had rejected their Creator, and\nsinned against the working of the Holy Spirit.\n\n>yes, it is up to God to judge.  but he will only mete out that\n>punishment at the last judgement. \n\nWell, I would hold that as God most certainly gives everybody some\nblessing for what good they have done (even if it was only a little),\nfor those He can't bless in the next life, He blesses in this one.  And\nthose He will not punish in the next life, will be chastised in this one\nor in Purgatory for their sins.  Every sin incurs some temporal\npunishment, thus, God will punish it unless satisfaction is made for it\n(cf. 2 Samuel 12.13-14, David's sin of Adultery and Murder were\nforgiven, but he was still punished with the death of his child.)  And I\nneed not point out the idea of punishment because of God's judgement is\nquite prevelant in the Bible.  Sodom and Gommorrah, Moses barred from\nthe Holy Land, the slaughter of the Cannanites, Annias and Saphira,\nJerusalem in 70 AD, etc.\n\n> if jesus stopped the stoning of an adulterous woman (perhaps this is\nnot a >good parallel, but i'm going to go with it anyway), why should we\nnot >stop the murder and violation of people who may (or may not) be more\n>innocent?\n\nWe should stop the slaughter of the innocent (cf Proverbs 24.11-12), but\ndoes that mean that Christians should support a war in Bosnia with the\nU.S. or even the U.N. involved?  I do not think so, but I am an\nisolationist, and disagree with foreign adventures in general.  But in\nthe case of Bosnia, I frankly see no excuse for us getting militarily\ninvolved, it would not be a \"just war.\"  \"Blessed\" after all, \"are the\npeacemakers\" was what Our Lord said, not the interventionists.  Our\nactions in Bosnia must be for peace, and not for a war which is\nunrelated to anything to justify it for us.\n\nAndy Byler\n",
    "id": "21408",
    "topic": "soc.religion.christian"
}
"""

glove-twitter-100

6 years ago

Pre-trained glove vectors based on 2B tweets, 27B tokens, 1.2M vocab, uncased.

attribute value
File size 387MB
Number of vectors 1193514
Dimension 100
License http://opendatacommons.org/licenses/pddl/

Read more:

Example

import gensim.downloader as api

model = api.load("glove-twitter-100")
model.most_similar(positive=['table', 'chair'], topn=1)

"""
Output:

[('desk', 0.8098949790000916)]
"""

glove-twitter-50

6 years ago

Pre-trained glove vectors based on 2B tweets, 27B tokens, 1.2M vocab, uncased.

attribute value
File size 200MB
Number of vectors 1193514
Dimension 50
License http://opendatacommons.org/licenses/pddl/

Read more:

Example

import gensim.downloader as api

model = api.load("glove-twitter-50")
model.most_similar(positive=['human', 'crime'], negative=['party'], topn=1)

"""
Output:

[('disease', 0.7200273871421814)]
"""

glove-twitter-25

6 years ago

Pre-trained glove vectors based on 2B tweets, 27B tokens, 1.2M vocab, uncased.

attribute value
File size 105MB
Number of vectors 1193514
Dimension 25
License http://opendatacommons.org/licenses/pddl/

Read more:

Example

import gensim.downloader as api
model = api.load("glove-twitter-25")
model.most_similar(positive=['fruit', 'flower'], topn=1)

"""
Output:

[('cherry', 0.9183273911476135)]
"""

glove-wiki-gigaword-300

6 years ago

Pre-trained glove vectors based on Wikipedia 2014 + Gigaword, 5.6B tokens, uncased

attribute value
File size 376MB
Number of vectors 400000
Dimension 300
License http://opendatacommons.org/licenses/pddl/

Read more:

Example

import gensim.downloader as api
model = api.load("glove-wiki-gigaword-300")
model.most_similar(positive=['mature', 'boy'], topn=1)

"""
Output:

[('girl', 0.6623601913452148)]
"""

glove-wiki-gigaword-200

6 years ago

Pre-trained glove vectors based on Wikipedia 2014 + Gigaword, 5.6B tokens, uncased

attribute value
File size 252MB
Number of vectors 400000
Dimension 200
License http://opendatacommons.org/licenses/pddl/

Read more:

Example

import gensim.downloader as api
model = api.load("glove-wiki-gigaword-200")
model.most_similar(positive=['tomato'],negative=['fruit'], topn=1)

"""
Output:

[('marinara', 0.48418283462524414)]
"""

glove-wiki-gigaword-100

6 years ago

Pre-trained glove vectors based on Wikipedia 2014 + Gigaword, 5.6B tokens, uncased

attribute value
File size 128MB
Number of vectors 400000
Dimension 100
License http://opendatacommons.org/licenses/pddl/

Read more:

Example

import gensim.downloader as api

model = api.load("glove-wiki-gigaword-100")
model.most_similar(positive=['highest', 'mountain'], topn=1)

"""
Output:

[('peak', 0.7558295726776123)]
"""