Gensim Data Versions Save

Data repository for pretrained NLP models and NLP corpora.

fasttext-wiki-news-subwords-300

6 years ago

Pre-trained FastText 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).

Feature Description
File size 959MB
Number of vectors 999999
Dimension 300
License https://creativecommons.org/licenses/by-sa/3.0/

Read more:

Example

import gensim.downloader as api

model = api.load("fasttext-wiki-news-subwords-300")
model.most_similar(positive=["russia", "river"])

"""
Output:

[(u'russias', 0.6939424276351929),
 (u'danube', 0.6881916522979736),
 (u'river.', 0.6683923006057739),
 (u'crimea', 0.6638611555099487),
 (u'rhine', 0.6632323861122131),
 (u'rivermouth', 0.6602864265441895),
 (u'wester', 0.6586191058158875),
 (u'finland', 0.6585439443588257),
 (u'volga', 0.6576792001724243),
 (u'ukraine', 0.6569074392318726)]

"""

semeval-2016-2017-task3-subtaskBC

6 years ago

SemEval 2016 / 2017 Task 3 Subtask B and C datasets contain train+development (317 original questions, 3,169 related questions, and 31,690 comments), and test datasets in English. The description of the tasks and the collected data is given in sections 3 and 4.1 of the 2016 task paper linked in section “Papers” of #18.

Related issue #18

attribute value
File size 6MB
Number of records 4 (upper level)

Read more:

Produced by: https://github.com/Witiko/semeval-2016_2017-task3-subtaskB-english

Example:

import gensim.downloader as api
from gensim.corpora import Dictionary
from gensim.similarities import MatrixSimilarity
from gensim.utils import simple_preprocess
import numpy as np


def read_corpus():
    for thread in api.load("semeval-2016-2017-task3-subtaskA-unannotated"):
        yield simple_preprocess(thread["RelQuestion"]["RelQSubject"])
        yield simple_preprocess(thread["RelQuestion"]["RelQBody"])
        for relcomment in thread["RelComments"]:
            yield simple_preprocess(relcomment["RelCText"])


dictionary = Dictionary(read_corpus())
datasets = api.load("semeval-2016-2017-task3-subtaskBC")


def produce_test_data(dataset):
    for orgquestion in datasets[dataset]:
        relquestions = [
            (
                dictionary.doc2bow(simple_preprocess(thread["RelQuestion"]["RelQSubject"]) + simple_preprocess(thread["RelQuestion"]["RelQBody"])),
                thread["RelQuestion"]["RELQ_RELEVANCE2ORGQ"] in ("PerfectMatch", "Relevant")
            )
            for thread in orgquestion["Threads"]
        ]

        relcomments = [
            (
                dictionary.doc2bow(simple_preprocess(relcomment["RelCText"])),
                relcomment["RELC_RELEVANCE2ORGQ"] == "Good"
            )
            for thread in orgquestion["Threads"] for relcomment in thread["RelComments"]
        ]

        orgquestion = dictionary.doc2bow(simple_preprocess(orgquestion["OrgQSubject"]) + simple_preprocess(orgquestion["OrgQBody"]))
        yield orgquestion, dict(subtaskB=relquestions, subtaskC=relcomments)


def average_precision(similarities, relevance):
    precision = [
        (num_correct + 1) / (num_total + 1) \
        for num_correct, num_total in enumerate(
            num_total for num_total, (_, relevant) in enumerate(
                sorted(zip(similarities, relevance), reverse=True)
            )
            if relevant)
        ]

    return np.mean(precision) if precision else 0.0


def evaluate(dataset, subtask):
    results = []
    for orgquestion, subtasks in produce_test_data(dataset):
        documents, relevance = zip(*subtasks[subtask])
        index = MatrixSimilarity(documents, num_features=len(dictionary))
        similarities = index[orgquestion]
        results.append(average_precision(similarities, relevance))

    return np.mean(results) * 100.0


for dataset in ("2016-dev", "2016-test", "2017-test"):
    print("MAP score on the {} dataset:\t{:.2f} (Subtask B)\t{:.2f} (Subtask C)".format(dataset, evaluate(dataset, "subtaskB"), evaluate(dataset, "subtaskC")))



"""
Output:

MAP score on the 2016-dev dataset:	41.89 (Subtask B)	3.33 (Subtask C)
MAP score on the 2016-test dataset:	51.42 (Subtask B)	5.59 (Subtask C)
MAP score on the 2017-test dataset:	23.65 (Subtask B)	0.74 (Subtask C)
"""

semeval-2016-2017-task3-subtaskA-unannotated

6 years ago

SemEval 2016 / 2017 Task 3 Subtask A unannotated dataset contains 189,941 questions and 1,894,456 comments in English collected from the Community Question Answering (CQA) web forum of Qatar Living. These can be used as a corpus for language modelling.

Related issue #18

attribute value
File size 224MB
Number of records 189941

Read more:

Produced by: https://github.com/Witiko/semeval-2016_2017-task3-subtaskA-unannotated-english

Example:

import gensim.downloader as api


for thread in api.load("semeval-2016-2017-task3-subtaskA-unannotated"):
    print("Question subjects: {}\n".format(thread["RelQuestion"]["RelQSubject"]))
    print("Question body: {}\n".format(thread["RelQuestion"]["RelQBody"]))
    print("Relevat comments: ")
    for idx, relcomment in enumerate(thread["RelComments"]):
        print("\t#{}: {}\n".format(idx + 1, relcomment["RelCText"]))
    break

"""
Output:

Question subjects: Thailand:IT Minsitry blocks CNN; Facebook;

Question body: The state of Internet in Thailand:IT Minsitry blocks CNN; Facebook; Yahoo; Flickr Thai Immigration website listed as dangerousFull story: http://www.thaivisa.com/forum/Thai-Govt-Blocks-Cnn-Yahoo-Financ-t321851.html

Relevat comments: 
	#1: have they blocked porn??? <img src="http://www.qatarliving.com/files/images/Da.gif">

	#2: like trying to contain a tsunami with a hand towel ************************************ I'm Jack's complete lack of surprise

	#3: oops double post.. ----------------- "HE WHO DARES WINS" Derek Edward Trotter

	#4: What next they gonna ban all *** tourist from entering the country? ----------------- "HE WHO DARES WINS" Derek Edward Trotter

	#5: Or you can always make your own there with some thai babys Rules are a guideline for intelligent people; but they must be adhered to by idiots.

	#6: why CNN? they want to die ignorant of what happens around?
"""

patent-2017

6 years ago

Raw full text and metadata of patent grants, from the US Patent and Trademark Office (USPTO), as distributed by Reed Tech.

Contains the full text including tables, International Patent Classification (IPC) and Cooperative Patent Classification (CPC), sequence data and 'in-line' mathematical expressions of each patent grant issued in 2017.

Read more about dataset history, usage and conditions:

attribute value
File size 3GB
Number of patents 353,197

For alternative patent datasets, see the discussion in issue #8.

Example:

import gensim.downloader as api
import json

dataset = api.load("patent-2017")
for idx, document in enumerate(dataset):
    print(json.dumps(document, indent=2))

"""
Output:

{
  "description": {
    "p": [
      "The present application claims the benefit under 35 U.S.C. \u00a7119 to U.S. provisional patent application Ser. No. 61/768,295, filed Feb. 22, 2013. The foregoing application is hereby incorporated by reference into the present application in its entirety.", 
      "The present inventions relate to tissue stimulation systems, and more particularly, to systems and methods for adjusting the stimulation provided to tissue to minimize the energy requirements of the systems.", 
      "Implantable neurostimulation systems have proven therapeutic in a wide variety of diseases and disorders. Pacemakers and Implantable Cardiac Defibrillators (ICDs) have proven highly effective in the treatment of a number of cardiac conditions (e.g., arrhythmias). Spinal Cord Stimulation (SCS) systems have long been accepted as a therapeutic modality for the treatment of chronic pain syndromes, and the application of spinal stimulation has begun to expand to additional applications, such as angina pectoris and incontinence. Deep Brain Stimulation (DBS) has also been applied therapeutically for well over a decade for the treatment of refractory Parkinson's Disease, and DBS has also recently been applied in additional areas, such as essential tremor and epilepsy. Further, in recent investigations, Peripheral Nerve Stimulation (PNS) systems have demonstrated efficacy in the treatment of chronic pain syndromes and incontinence, and a number of additional applications are currently under investigation. Furthermore, Functional Electrical Stimulation (FES) systems such as the Freehand system by NeuroControl (Cleveland, Ohio) have been applied to restore some functionality to paralyzed extremities in spinal cord injury patients.", 
      "Each of these implantable neurostimulation systems typically includes one or more electrode carrying stimulation leads, which are implanted at the desired stimulation site, and a neurostimulation device implanted remotely from the stimulation site, but coupled either directly to the stimulation lead(s) or indirectly to the stimulation lead(s) via a lead extension. Thus, electrical pulses can be delivered from the neurostimulation device to the electrode(s) to activate a volume of tissue in accordance with a set of stimulation parameters and provide the desired efficacious therapy to the patient. In particular, electrical energy conveyed between at least one cathodic electrode and at least one anodic electrode creates an electrical field, which when strong enough, depolarizes (or \u201cstimulates\u201d) the neurons beyond a threshold level, thereby evoking action potentials (APs) that propagate along the neural fibers. A typical stimulation parameter set may include the electrodes that are sourcing (anodes) or returning (cathodes) the modulating current at any given time, as well as the amplitude, duration, and rate of the stimulation pulses.", 
      "The neurostimulation system may further comprise a handheld patient programmer to remotely instruct the neurostimulation device to generate electrical stimulation pulses in accordance with selected stimulation parameters. The handheld programmer in the form of a remote control (RC) may, itself, be programmed by a clinician, for example, by using a clinician's programmer (CP), which typically includes a general purpose computer, such as a laptop, with a programming software package installed thereon.", 
      "Of course, neurostimulation devices are active devices requiring energy for operation, and thus, the neurostimulation system may oftentimes includes an external charger to recharge a neurostimulation device, so that a surgical procedure to replace a power depleted neurostimulation device can be avoided. To wirelessly convey energy between the external charger and the implanted neurostimulation device, the charger typically includes an alternating current (AC) charging coil that supplies energy to a similar charging coil located in or on the neurostimulation device. The energy received by the charging coil located on the neurostimulation device can then be used to directly power the electronic componentry contained within the neurostimulation device, or can be stored in a rechargeable battery within the neurostimulation device, which can then be used to power the electronic componentry on-demand.", 
      "Typically, the therapeutic effect for any given neurostimulation application may be optimized by adjusting the stimulation parameters. Although the threshold for evoking action potentials may be a good indication of whether a desired therapeutic result is achieved, it is usually not directly observable when programming the neurostimulation device. For this reason, the programmer of the neurostimulation system is often required to identify the efficacy threshold and the side-effect threshold based on the patient's perception. For instance, the programmer of the neurostimulation system may identify the efficacy threshold by asking the patient whether the pain is relieved or perceived paresthesia, and record the set of stimulation parameters of that stimulation level. Similarly, the side-effect threshold is identified by adjusting the stimulation until the patient perceives any undesired side-effects such as slurred speech or involuntary muscle contraction, and records the set of stimulation parameters of that stimulation level. Then, the neurostimulation system is configured with a certain set of stimulation parameters to generate stimulation at an arbitrary level within the therapeutic window so that the stimulation is perceptible by the patient without causing any undesirable side effects.", 
      "There are a few issues that need to be considered when using this approach. Many neurostimulation therapies take time to develop the clinical benefit. For example, the patient may need to be on a certain level of stimulation for a few hours or even days before he or she can actually feel the pain relief or regain muscles mobility. Also, the side effect threshold is often not perfectly correlated with the therapeutic effect. Therefore, relying on the subjective clinical assessment (e.g., perception threshold) at the acute setting and configuring the stimulation parameters may result in an erroneous therapeutic window. Moreover, various changes, including postural changes, leads movement and tissue maturation, may occur in the patient during the course of therapy, and the stimulation parameters may need to be re-calibrated using the same unreliable subjective clinical assessment approach, thus the therapeutic window is often chosen to be very broad. That is, the gap between the efficacy threshold and the side-effect threshold is set as far as possible. In order to prevent under-stimulation and over-stimulation, a set of stimulation parameters are chosen to generate a stimulation pulse at the mid-level of the wide therapeutic window. The set of stimulation parameters for generating such stimulation pulse is more energy-intensive than necessary to achieve the therapy, which in turn causes decreased battery life, more frequent recharge cycles, and/or in the case where non-chargeable primary cell devices are used, more frequent surgeries for replacing the battery.", 
      "There, thus, remains a need to decrease the energy requirements for neurostimulation therapy.", 
      "In accordance with the present inventions, a neurostimulation system is provided. The system comprises stimulation output circuitry configured for delivering stimulation pulses to target tissue in accordance with a set of stimulation parameters (e.g., at least one of a pulse amplitude, a pulse width, a pulse rate, a duty cycle, a burst rate, and an electrode combination), monitoring circuitry configured for continuously measuring action potentials evoked in the target tissue (e.g., one of an evoked compound action potential and an evoked compound muscle action potential) in response to the delivery of the stimulation pulses to the target tissue, memory configured for storing a characteristic of a reference evoked action potential (e.g., at least one of peak delay, width, amplitude, and waveform morphology), which may be a therapeutic evoked action potential or a side-effect evoked action potential, and at least one processor configured for initiating an automatic mode, in which a characteristic of the measured evoked action potentials is compared to the corresponding characteristic of the reference evoked action potential, and one or more stimulation parameter values in the set of stimulation parameters are adjusted to decrease or increase the energy level of the stimulation pulses, thereby evoking action potentials in the target tissue having substantially the same corresponding characteristic as the reference evoked action potential.", 
      "In one embodiment, the processor(s) is configured for triggering the automatic mode based on one or more of the following pre-defined conditions: (a) immediately upon measuring evoked action potentials having a characteristic different from the characteristic of the reference evoked action potential; (b) upon measuring evoked action potentials having a characteristic different from the characteristic of the reference action evoked action potential by more than a predetermined tolerance threshold; (c) upon measuring evoked action potentials having a characteristic different from the characteristic of the reference evoked action potential for more than a predetermined time period, and (d) upon measuring evoked action potentials having a characteristic different from the characteristic of the reference evoked action potential for more than a predetermined number of measurements.", 
      "In another embodiment, the processor(s) is configured for halting or resuming the automatic mode based on one or more conditions comprising patient's movement, patient's temperature, patient's blood flow, electrocortigram, electroencephalogram, tissue or transcutaneous oxygen tension, glucose concentration, impedance measurement, chemical species concentration, and whether the patient is asleep or awake. The processor(s) may be configured for selecting the stimulation parameter to be adjusted and the step size for the adjustment. In an optional embodiment, the processor(s) is configured for generating an alert upon initiating the automatic stimulation adjustment mode, thereby allowing manual adjustment of the one or more stimulation parameter values. The processor(s) may be configured for alternately using two or more of the reference evoked action potentials based on a predefined therapeutic schedule.", 
      "In one embodiment, the automatic mode is an automatic stimulation adjustment mode. In this case, the processor(s) may be configured for using the comparison between the measured evoked action potentials and the reference evoked action potential to determine whether the stimulation pulses delivered to the target tissue was an over-stimulation or an under-stimulation of the target tissue, and the stimulation parameter value(s) may be adjusted to gradually decrease or increase the energy level of the stimulation pulses, respectively, until the measured evoked action potentials have substantially the same characteristic as the reference evoked action potential.", 
      "In another embodiment, the automatic mode is an automatic power consumption optimization mode. In this case, the stimulation parameter value(s) may be adjusted to decrease the energy level of the stimulation pulses, thereby evoking action potentials in the target tissue having substantially the same corresponding characteristic as the reference evoked action potential. Furthermore, the memory may be configured for storing a threshold stimulation parameter value and a template identifying the characteristic of the reference evoked action potential, and the processor(s) may be configured for (a) adjusting at least one stimulation parameter value in the threshold stimulation parameter set by a step size; (b) measuring an action potential evoked in the target tissue by actuating the stimulation output circuitry to generate a stimulation pulse in accordance with the stimulation parameter value(s); (c) comparing the measured evoked action potential to the template; (d) replacing the threshold stimulation parameter value in the threshold stimulation parameter set with the adjusted stimulation parameter value(s) when the characteristic of the measured evoked action potential matches the template, and (e) repeating steps (a)-(d) to identify the most energy efficient set of stimulation parameters capable of generating evoked action potential from the target tissue having substantially the same characteristic as the reference evoked action potential.", 
      "The stimulation output circuitry, the monitoring circuitry, the processor(s), and the memory may be implemented in a single device, such as an implantable electric pulse generator. In another embodiment, the stimulation output circuitry, the monitoring circuitry, the processor(s), and the memory may be implemented within a plurality of devices.", 
      "Other and further aspects and features of the invention will be evident from reading the following detailed description of the preferred embodiments, which are intended to illustrate, not limit, the invention.", 
      "The present disclosure relates to a system and method for automatically minimizing the power consumption of neurostimulation systems while maintaining the stimulation pulse at efficacious level. The neurostimulation system of the present disclosure uses evoked action potential as an indicator for determining the effectiveness of therapeutic effect of electrical stimulation pulse at the target neural tissue. Evoked action potential is electrical signal generated by the nerve tissues in response to sensory or external stimuli. Characteristics of an evoked action potential that correlates to a certain therapeutic effect is stored as a template (for example, reference evoked action potential) for matching against other electrophysiological signals that are recorded later. The comparison between the characteristics of the recorded evoked action potential and the characteristics of the targeted evoked action potential (i.e., the template) provides an objective assessment as to the effectiveness of the stimulation. This objective and quantitative measurement allows for the system to automatically adjust the stimulation parameters to maintain the efficacious therapeutic effect with the minimal power consumption requirement.", 
      "In this disclosure, various technical features are described in relation to a spinal column stimulation (SCS) system. The SCS system is configured to apply at least one stimulus to targeted neural tissue to provide one or more medical, psychiatric, and/or neurological therapeutic effects. However, it should be appreciated that the disclosure may not be so limited to an SCS system, but rather the features disclosed herein may be used with any other types of implantable electrical stimulation systems. For example, the present disclosure may be used as part of a pacemaker, a defibrillator, a cochlear modulator device, a retinal modulator device, a modulator device configured to produce coordinated limb movement, a cortical modulator device, a deep brain modulator device, an occipital nerve modulator device, a peripheral nerve modulator device, a micro-modulator device, or in any other tissue modulator device configured to treat urinary incontinence, sleep apnea, shoulder sublaxation, headache, and similar ailments.", 
      {"b": [10, 12, 1, 12, 2, 14, 16, 18, 20, 22], "figref": "FIG. 1"}, 
      {"b": [14, 12, 24, 12, 26, 12, 26, 12, 26, 14, 26]}, 
      {"b": [20, 28, 30, 12, 20, 14, 26, 20, 14, 20, 20, 12, 14, 14, 14, 20, 20, 14, 20]}, 
      {"b": [16, 20, 32, 14, 12, 16, 14, 34, 14, 14, 14]}, 
      {"b": [18, 14, 20, 18, 14, 20, 16, 36, 18, 14, 20, 18, 16, 16, 18]}, 
      {"b": [16, 18, 16, 18]}, 
      {"b": [14, 16, 18, 14, 14, 16, 18, 14, 18, 14, 16, 14, 16]}, 
      {"b": [22, 14, 38, 14, 22, 14, 16, 18, 20, 22]}, 
      {"b": [12, 14, 12, 1, 26, 1, 8, 12, 2, 26, 9, 16], "figref": "FIG. 2"}, 
      {"b": [14, 40, 42, 12, 26, 40, 40, 40]}, 
      {"b": [14, 26, 14]}, 
      {"b": [10, 40]}, 
      {"b": [26, 40, 14, 26, 40, 40, 26, 26, 26, 26, 26, 26]}, 
      "The electrical energy (i.e., stimulation pulse) may be delivered between electrodes as monophasic electrical energy or multiphasic electrical energy. Monophasic electrical energy includes a series of pulses that are either all positive (anodic) or all negative (cathodic). Multiphasic electrical energy includes a series of pulses that alternate between positive and negative. For example, multiphasic electrical energy may include a series of biphasic pulses, with each biphasic pulse including a cathodic (negative) stimulation pulse and an anodic (positive) recharge pulse that is generated after the stimulation pulse to prevent direct current charge transfer through the tissue, thereby avoiding electrode degradation and cell trauma.", 
      "That is, a charge is conveyed through the electrode-tissue interface via current at an electrode during a stimulation period (the length of the stimulation pulse), and then pulled back off the electrode-tissue interface via an oppositely polarized current at the same electrode during a recharge period (the length of the recharge pulse). The recharge pulse may be active, in which case, the electrical current is actively conveyed through the electrode via current or voltage sources, or the recharge pulse may be passive, in which case, the electrical current may be passively conveyed through the electrode via redistribution of the charge flowing from coupling capacitances present in the circuit.", 
      {"b": [12, 46, 48, 12, 12, 12, 12, 46, 14], "figref": "FIG. 3"}, 
      {"b": [14, 24, 14, 12, 18, 14, 16]}, 
      {"b": [14, 14, 50, 52, 54, 56, 50, 50, 1, 16, 58, 1, 16], "figref": "FIG. 4"}, 
      {"b": [50, 58, 58, 58, 50]}, 
      {"b": [14, 60, 62, 14, 14, 60, 26, 26]}, 
      {"b": [14, 64, 52, 66, 60, 68, 14, 56, 14, 70, 72, 64, 64, 70, 72, 70]}, 
      {"b": [64, 64, 14, 14, 64, 26, 50, 52, 56, 26, 26]}, 
      {"b": [14, 74, 16, 18, 76, 74, 70, 14]}, 
      {"b": [14, 78, 80, 60, 16, 18, 14, 16, 18, 14, 16, 18, 16, 18, 14, 16, 18]}, 
      {"b": [64, 26, 60, 60, 70, 16, 18, 16, 18, 74, 76, 16, 18, 14, 14, 16, 18, 78, 80, 16, 18, 14, 14]}, 
      {"b": [14, 82, 84, 14, 82, 82, 84, 84, 86, 14, 82, 74, 82, 14, 74, 76, 82, 74, 74, 80]}, 
      {"b": [10, 12]}, 
      {"b": [16, 16, 14, 18, 20, 16, 100, 102, 104, 100, 102, 104, 102, 104, 106, 108, 110, 112, 14, 14], "figref": "FIG. 5"}, 
      {"b": [106, 14, 108, 106, 110, 112, 14, 108, 16, 110, 112, 110, 112, 110, 112]}, 
      {"b": [108, 16, 14, 150, 152, 14, 14, 16, 150, 154, 14, 16, 14, 16], "figref": "FIG. 6"}, 
      {"i": ["a", "b", "c", "d"], "b": [150, 156, 14, 156, 156, 156, 156, 150, 158, 158, 150, 160]}, 
      {"b": [16, 16, 114, 116, 114, 16, 120, 104, 102, 118, 14, 114, 104, 14, 114, 14, 118, 116, 114, 14, 118, 14, 118, 18, 16], "figref": "FIG. 7"}, 
      {"b": [16, 18], "figref": "FIGS. 4-7"}, 
      {"b": 14}, 
      {"b": [26, 26, 26]}, 
      {"b": [800, 1], "figref": ["FIG. 8",  "FIG. 8"]}, 
      {"b": [14, 14, 14, 10]}, 
      {"b": [10, 10, 16, 18, 10]}, 
      {"b": [10, 26, 10]}, 
      {"b": [10, 10, 10, 10, 10]}, 
      {"b": [910, 10, 920, 26, 10, 14, 930, 14, 940, 950, 10, 950], "figref": "FIG. 9"}, 
      {"b": [26, 910, 10, 14, 920, 10, 950, 10, 960, 10, 10, 970]}, 
      {"b": 10}, 
      "It should be appreciated that an equivalent or substantially same evoked action potential may result from various alternative sets of stimulation parameters. For example, a lower amplitude stimulation pulse closer to the target tissue and a higher amplitude stimulation pulse from a distance may result in the same evoked action potential. Likewise, an evoked action potential measurement in response to a lower amplitude stimulation pulse at higher pulse rate (e.g., \u201chigher frequency\u201d) may be have substantially same characteristics as the characteristics of evoked action potential in response to a higher amplitude stimulation pulse at slower pulse rate.", 
      {"b": [10, 10, 10, 10]}, 
      {"b": [14, 14, 14]}, 
      {"b": [10, 14, 26]}, 
      {"b": [10, 10]}, 
      {"b": [16, 18, 1100, 14], "figref": ["FIG. 10", "FIG. 10", "FIG. 10"]}, 
      "As mentioned above, the evoked action potentials can be measured by either the same electrodes that were used for the stimulation or other electrodes near the stimulated neural tissues. The evoked action potential measurements from the group of stimulated neural elements (e.g., neurons, muscle fibers) may be processed in an appropriate manner so that a reliable determination of evoked action potential can be made. For instance, evoked action potential measurements may be averaged to obtain evoked compound action potentials or compound muscle action potentials. Further, an artifact or noise suppression process may be implemented by using hardware (e.g., blanking circuit) or software to prevent noises (e.g., stimulation artifact) from contaminating the compound evoked action potential measurement.", 
      {"b": 1200}, 
      {"b": [1300, 14, 10, 10]}, 
      {"b": 14, "figref": "FIG. 10"}, 
      {"b": [1400, 1500, 1600, 10]}, 
      {"b": [1800, 1700, 10, 1400]}, 
      {"b": 10}, 
      {"b": 10}, 
      {"b": 10}, 
      {"b": [1600, 10, 14, 16, 18]}, 
      {"b": 16}, 
      {"b": [10, 14, 12, 10]}, 
      {"b": 10}, 
      {"b": [14, 16, 18, 10]}, 
      "Although particular embodiments of the present inventions have been shown and described, it will be understood that it is not intended to limit the present inventions to the preferred embodiments, and it will be obvious to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the present inventions. Thus, the present inventions are intended to cover alternatives, modifications, and equivalents, which may be included within the spirit and scope of the present inventions as defined by the claims."
    ], 
    "heading": [
      "RELATED APPLICATION DATA", 
      "FIELD OF THE INVENTION", 
      "BACKGROUND OF THE INVENTION", 
      "SUMMARY OF THE INVENTION", 
      "DETAILED DESCRIPTION OF THE EMBODIMENTS"
    ], 
    "description-of-drawings": {
      "p": [
        "The present disclosure is further described in the detailed description which follows, in reference to the noted plurality of drawings, by way of non-limiting examples of preferred embodiments of the present disclosure, in which like characters represent like elements throughout the several views of the drawings.", 
        {"figref": "FIG. 1"}, 
        {"figref": ["FIG. 2", "FIG. 1"]}, 
        {"figref": ["FIG. 3", "FIG. 1"]}, 
        {"figref": ["FIG. 4", "FIG. 2"]}, 
        {"figref": ["FIG. 5", "FIG. 1"]}, 
        {"figref": ["FIG. 6", "FIG. 5", "FIG. 2"]}, 
        {"figref": ["FIG. 7", "FIG. 5"]}, 
        {"figref": "FIG. 8"}, 
        {"figref": "FIG. 9"}, 
        {"figref": ["FIG. 10", "FIG. 2"]}
      ], 
      "heading": "BRIEF DESCRIPTION OF THE DRAWINGS"
    }
  }, 
  "abstract": {
    "p": "A neurostimulation system comprising stimulation output circuitry configured for delivering stimulation pulses to target tissue in accordance with a set of stimulation parameters. The neurostimulation system comprises monitoring circuitry configured for continuously measuring action potentials evoked in the target tissue in response to the delivery of the stimulation pulses to the target tissue, memory configured for storing a characteristic of a reference evoked action potential, and at least one processor configured for initiating an automatic mode, in which a characteristic of the measured evoked action potentials is compared to the corresponding characteristic of the reference evoked action potential, and one or more stimulation parameter values in the set of stimulation parameters are adjusted to decrease or increase the energy level of the stimulation pulses, thereby evoking action potentials in the target tissue having substantially the same corresponding characteristic as the reference evoked action potential."
  }, 
  "drawings": {
    "figure": [
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}, 
      {"img": null}
    ]
  }, 
  "claims": {
    "claim": [
      {
        "claim-text": {
          "claim-text": [
            "stimulation output circuitry configured for delivering stimulation pulses to target tissue in accordance with a set of stimulation parameters;", 
            "monitoring circuitry configured for continuously measuring action potentials evoked in the target tissue in response to the delivery of the stimulation pulses to the target tissue;", 
            "memory configured for storing a reference evoked action potential template, the template storing two or more characteristics of a reference evoked action potential that correlates to effective stimulation; and", 
            "at least one processor configured for implementing an automatic power consumption optimization process to reduce power consumption and maintain effective stimulation, the automatic power consumption optimization process including comparing two or more characteristics of the measured evoked action potentials to the two or more characteristics of the reference evoked action potential template, and adjusting one or more stimulation parameter values in the set of stimulation parameters to decrease the energy level of the stimulation pulses and evoke action potentials in the target tissue having substantially the same two or more characteristics as the two or more characteristics of the reference evoked action potential template."
          ]
        }
      }, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {
        "claim-text": {
          "claim-ref": "claim 1", 
          "claim-text": [
            "a. immediately up on measuring evoked action potentials having a characteristic different from the characteristic of the reference evoked action potential;", 
            "b. up on measuring evoked action potentials having a characteristic different from the characteristic of the reference action evoked action potential by more than a predetermined tolerance threshold;", 
            "c. up on measuring evoked action potentials having a characteristic different from the characteristic of the reference evoked action potential for more than a predetermined time period; and", 
            "d. up on measuring evoked action potentials having a characteristic different from the characteristic of the reference evoked action potential for more than a predetermined number of measurements."
          ]
        }
      }, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 12"}}, 
      {
        "claim-text": {
          "claim-ref": "claim 1", 
          "claim-text": [
            "(a) adjusting at least one stimulation parameter value in the set of stimulation parameters by a step size;", 
            "(b) measuring an action potential evoked in the target tissue by actuating the stimulation output circuitry to generate a stimulation pulse in accordance with the at least one stimulation parameter value;", 
            "(c) comparing the measured evoked action potential to the template;", 
            "(d) replacing the threshold stimulation parameter value in the threshold stimulation parameter set with the at least one adjusted stimulation parameter value when the two or more characteristics of the measured evoked action potential matches the two or more characteristics of the template; and", 
            "(e) repeating steps (a)-(d) to identify the most energy efficient set of stimulation parameters capable of generating evoked action potential from the target tissue having substantially the same two or more characteristics as the two or more characteristics of the reference evoked action potential."
          ]
        }
      }, 
      {"claim-text": {"claim-ref": "claim 1"}}, 
      {"claim-text": {"claim-ref": "claim 15"}}, 
      {"claim-text": {"claim-ref": "claim 1"}}
    ]
  }, 
  "us-bibliographic-data-grant": {
    "us-references-cited": {
      "us-citation": [
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20030200, 
              "country": "US", 
              "kind": "B1", 
              "name": "Meadows et al.", 
              "doc-number": 6516227
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20050500, 
              "country": "US", 
              "kind": "B2", 
              "name": "Meadows et al.", 
              "doc-number": 6895280
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20060100, 
              "country": "US", 
              "kind": "B2", 
              "name": "Bradley et al.", 
              "doc-number": 6993384
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20090500, 
              "country": "US", 
              "kind": "B2", 
              "name": "Parramon et al.", 
              "doc-number": 7539538
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20100100, 
              "country": "US", 
              "kind": "B2", 
              "name": "Walter", 
              "doc-number": 7650184
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20110900, 
              "country": "US", 
              "kind": "B2", 
              "name": "Kuzma et al.", 
              "doc-number": 8019439
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20030700, 
              "country": "US", 
              "kind": "A1", 
              "name": "Bradley et al.", 
              "doc-number": "2003/0139781"
            }
          }
        }, 
        {
          "category": "cited by applicant", 
          "patcit": {
            "document-id": {
              "date": 20070600, 
              "country": "US", 
              "kind": "A1", 
              "name": "Anderson", 
              "doc-number": "2007/0150036"
            }
          }
        }, 
        {
          "category": "cited by examiner", 
          "classification-national": {
            "country": "US", 
            "main-classification": "607 46"
          }, 
          "patcit": {
            "document-id": {
              "date": 20140800, 
              "country": "US", 
              "kind": "A1", 
              "name": "Parker et al.", 
              "doc-number": "2014/0236257"
            }
          }
        }, 
        {
          "nplcit": {"othercit": "U.S. Patent Provisional U.S. Appl. No. 61/646,773, System and Method for Shaped Phased Current Delivery, Inventor: Kerry Bradley et al., filed: May 14, 2012."}, 
          "category": "cited by applicant"
        }
      ]
    }, 
    "us-parties": {
      "agents": {
        "agent": {
          "addressbook": {
            "orgname": "Schwegman Lundberg & Woessner, P.A.", 
            "address": {
              "country": "unknown"
            }
          }
        }
      }, 
      "us-applicants": {
        "us-applicant": {
          "residence": {
            "country": "US"
          }, 
          "addressbook": {
            "orgname": "BOSTON SCIENTIFIC NEUROMODULATION CORPORATION", 
            "address": {
              "city": "Valencia", 
              "state": "CA", 
              "country": "US"
            }
          }
        }
      }, 
      "inventors": {
        "inventor": {
          "addressbook": {
            "first-name": "Stephen", 
            "last-name": "Carcieri", 
            "address": {
              "city": "Los Angeles", 
              "state": "CA", 
              "country": "US"
            }
          }
        }
      }
    }, 
    "us-term-of-grant": {
      "us-term-extension": 65
    }, 
    "number-of-claims": 17, 
    "assignees": {
      "assignee": {
        "addressbook": {
          "orgname": "Boston Scientific Neuromodulation Corporation", 
          "role": 2, 
          "address": {
            "city": "Valencia", 
            "state": "CA", 
            "country": "US"
          }
        }
      }
    }, 
    "us-application-series-code": 14, 
    "us-field-of-classification-search": {
      "classification-national": {
        "country": "US", 
        "main-classification": "None"
      }
    }, 
    "us-exemplary-claim": 1, 
    "classifications-ipcr": {
      "classification-ipcr": [
        {
          "classification-status": "B", 
          "classification-value": "I", 
          "action-date": {
            "date": 20170103
          }, 
          "section": "A", 
          "classification-data-source": "H", 
          "subclass": "N", 
          "generating-office": {
            "country": "US"
          }, 
          "classification-level": "A", 
          "symbol-position": "F", 
          "ipc-version-indicator": {
            "date": 20060101
          }, 
          "main-group": 1, 
          "class": 61, 
          "subgroup": 36
        }, 
        {
          "classification-status": "B", 
          "classification-value": "I", 
          "action-date": {
            "date": 20170103
          }, 
          "section": "A", 
          "classification-data-source": "H", 
          "subclass": "B", 
          "generating-office": {
            "country": "US"
          }, 
          "classification-level": "A", 
          "symbol-position": "L", 
          "ipc-version-indicator": {
            "date": 20060101
          }, 
          "main-group": 5, 
          "class": 61, 
          "subgroup": 484
        }, 
        {
          "classification-status": "B", 
          "classification-value": "I", 
          "action-date": {
            "date": 20170103
          }, 
          "section": "A", 
          "classification-data-source": "H", 
          "subclass": "B", 
          "generating-office": {
            "country": "US"
          }, 
          "classification-level": "A", 
          "symbol-position": "L", 
          "ipc-version-indicator": {
            "date": 20060101
          }, 
          "main-group": 5, 
          "class": 61, 
          "subgroup": 0
        }, 
        {
          "classification-status": "B", 
          "classification-value": "N", 
          "action-date": {
            "date": 20170103
          }, 
          "section": "A", 
          "classification-data-source": "H", 
          "subclass": "B", 
          "generating-office": {
            "country": "US"
          }, 
          "classification-level": "A", 
          "symbol-position": "L", 
          "ipc-version-indicator": {
            "date": 20060101
          }, 
          "main-group": 5, 
          "class": 61, 
          "subgroup": 4
        }
      ]
    }, 
    "us-related-documents": {
      "related-publication": {
        "document-id": {
          "date": 20140828, 
          "country": "US", 
          "kind": "A1", 
          "doc-number": 20140243926
        }
      }
    }, 
    "application-reference": {
      "document-id": {
        "date": 20140221, 
        "country": "US", 
        "doc-number": 14187043
      }
    }, 
    "invention-title": "Neurostimulation system and method for automatically adjusting stimulation and reducing energy requirements using evoked action potential", 
    "figures": {
      "number-of-figures": 10, 
      "number-of-drawing-sheets": 10
    }, 
    "publication-reference": {
      "document-id": {
        "date": 20170103, 
        "country": "US", 
        "kind": "B2", 
        "doc-number": 9533148
      }
    }, 
    "examiners": {
      "primary-examiner": {
        "department": 3766, 
        "first-name": "Erica", 
        "last-name": "Lee"
      }
    }, 
    "classifications-cpc": {
      "main-cpc": {
        "classification-cpc": {
          "classification-status": "B", 
          "classification-value": "I", 
          "cpc-version-indicator": {
            "date": 20130101
          }, 
          "classification-data-source": "H", 
          "subclass": "N", 
          "class": 61, 
          "generating-office": {
            "country": "US"
          }, 
          "scheme-origination-code": "C", 
          "symbol-position": "F", 
          "action-date": {
            "date": 20170103
          }, 
          "main-group": 1, 
          "section": "A", 
          "subgroup": 36071
        }
      }, 
      "further-cpc": {
        "classification-cpc": [
          {
            "classification-status": "B", 
            "classification-value": "I", 
            "cpc-version-indicator": {
              "date": 20130101
            }, 
            "classification-data-source": "H", 
            "subclass": "B", 
            "class": 61, 
            "generating-office": {
              "country": "US"
            }, 
            "scheme-origination-code": "C", 
            "symbol-position": "L", 
            "action-date": {
              "date": 20170103
            }, 
            "main-group": 5, 
            "section": "A", 
            "subgroup": 484
          }, 
          {
            "classification-status": "B", 
            "classification-value": "I", 
            "cpc-version-indicator": {
              "date": 20130101
            }, 
            "classification-data-source": "H", 
            "subclass": "B", 
            "class": 61, 
            "generating-office": {
              "country": "US"
            }, 
            "scheme-origination-code": "C", 
            "symbol-position": "L", 
            "action-date": {
              "date": 20170103
            }, 
            "main-group": 5, 
            "section": "A", 
            "subgroup": 486
          }, 
          {
            "classification-status": "B", 
            "classification-value": "I", 
            "cpc-version-indicator": {
              "date": 20130101
            }, 
            "classification-data-source": "H", 
            "subclass": "N", 
            "class": 61, 
            "generating-office": {
              "country": "US"
            }, 
            "scheme-origination-code": "C", 
            "symbol-position": "L", 
            "action-date": {
              "date": 20170103
            }, 
            "main-group": 1, 
            "section": "A", 
            "subgroup": 36139
          }, 
          {
            "classification-status": "B", 
            "classification-value": "A", 
            "cpc-version-indicator": {
              "date": 20130101
            }, 
            "classification-data-source": "H", 
            "subclass": "B", 
            "class": 61, 
            "generating-office": {
              "country": "US"
            }, 
            "scheme-origination-code": "C", 
            "symbol-position": "L", 
            "action-date": {
              "date": 20170103
            }, 
            "main-group": 5, 
            "section": "A", 
            "subgroup": 4001
          }, 
          {
            "classification-status": "B", 
            "classification-value": "A", 
            "cpc-version-indicator": {
              "date": 20130101
            }, 
            "classification-data-source": "H", 
            "subclass": "B", 
            "class": 61, 
            "generating-office": {
              "country": "US"
            }, 
            "scheme-origination-code": "C", 
            "symbol-position": "L", 
            "action-date": {
              "date": 20170103
            }, 
            "main-group": 2560, 
            "section": "A", 
            "subgroup": 209
          }
        ]
      }
    }
  }, 
  "us-claim-statement": "What is claimed is:"
}


"""

conceptnet-numberbatch-17-06-300

6 years ago

ConceptNet Numberbatch consists of state-of-the-art semantic vectors (also known as word embeddings) that can be used directly as a representation of word meanings or as a starting point for further machine learning.

Related issue #9.

attribute value
File size 1.14GB
Number of vectors 1917247
Dimension 300
License https://github.com/commonsense/conceptnet-numberbatch/blob/master/LICENSE.txt

Read more:

Example

import gensim.downloader as api

model = api.load("conceptnet-numberbatch-17-06-300")
for word, distance in model.most_similar("/c/en/beer"):
    print(u"{}: {:4f}".format(word, distance))

"""
output:

/c/ca/birra: 0.995633
/c/eu/zerbeza: 0.995058
/c/hi/बियर: 0.994754
/c/ja/ビア: 0.994656
/c/ja/ビヤ: 0.994406
/c/ja/ビーア: 0.994406
/c/eu/garagardo: 0.994178
/c/ku/بیرە: 0.993689
/c/eu/biera: 0.993634
/c/sh/пиво: 0.992218
"""

word2vec-ruscorpora-300

6 years ago

Word2vec Continuous Skipgram vectors trained on the full Russian National Corpus (about 250M words).

Related issue https://github.com/RaRe-Technologies/gensim-data/issues/3.

attribute value
File size 199MB
Number of vectors 184973
Preprocessing The corpus (used for training) was lemmatized and tagged with Universal PoS
Window size 10
Dimension 300
License https://creativecommons.org/licenses/by/4.0/deed.en

Read more:

Example

import gensim.downloader as api

model = api.load("word2vec-ruscorpora-300")
for word, distance in model.most_similar(u"кот_NOUN"):  
    print(u"{}: {:.3f}".format(word, distance))
  
"""
output:

кошка_NOUN: 0.757
котенок_NOUN: 0.668
пес_NOUN: 0.563
мяукать_VERB: 0.562
тобик_NOUN: 0.559
фоксик_NOUN: 0.557
собака_NOUN: 0.557
мяучать_VERB: 0.554
харлашка_NOUN: 0.552
котяра_NOUN: 0.551
"""

quora-duplicate-questions

6 years ago

Over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line contains a duplicate pair or not.

attribute value
File size 21MB
Number of pairs 404290
License probably https://www.quora.com/about/tos

Read more:

Example

import gensim.downloader as api
import json

data = api.load("quora-duplicate-questions")

for question_pair in data:
    print(json.dumps(question_pair, indent=4))
    break

"""
Output:

{
    "qid1": "1",
    "question2": "What is the step by step guide to invest in share market?",
    "qid2": "2",
    "is_duplicate": "0",
    "question1": "What is the step by step guide to invest in share market in india?",
    "id": "0"
}
"""

wiki-english-20171001

6 years ago

word2vec-google-news-300

6 years ago

Pre-trained vectors trained on a part of the Google News dataset (about 100 billion words). The model contain vectors for 3 million words and phrases. The phrases were obtained using a simple data-driven approach described in "Distributed Representations of Words and Phrases and their Compositionality".

Feature Description
File size 1.6GB
Number of vectors 3000000
Dimension 300

Read more:

Example

import gensim.downloader as api

model = api.load("word2vec-google-news-300")
model.most_similar(positive=["king", "woman"], negative=["man"])

"""
Output:

[(u'queen', 0.7118192911148071),
 (u'monarch', 0.6189674139022827),
 (u'princess', 0.5902431011199951),
 (u'crown_prince', 0.5499460697174072),
 (u'prince', 0.5377321243286133),
 (u'kings', 0.5236844420433044),
 (u'Queen_Consort', 0.5235945582389832),
 (u'queens', 0.518113374710083),
 (u'sultan', 0.5098593235015869),
 (u'monarchy', 0.5087411999702454)]

"""

__testing_multipart-matrix-synopsis

6 years ago

:exclamation: For testing purposes only :exclamation:

Source : matrix-synopsis