nltkまたはpythonを使用してストップワードを削除する方法

Question

だから私は使用することからストップワードを削除したいデータセットを持っています

stopwords.words('english')

私はコード内でこれを使用して、単にこれらの単語を取り出す方法に苦労しています。このデータセットの単語のリストは既にあります。私が苦労しているのは、このリストと比較してストップワードを削除することです。どんな助けも大歓迎です。

Daren Thomas · Answer

from nltk.corpus import stopwords # ... filtered_words = [Word for Word in Word_list if Word not in stopwords.words('english')]

David Lemphers · Answer

たとえば、次のようにset diffを実行することもできます。

list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words('english')))

das_weezul · Answer

ストップワードを削除する単語のリスト（Word_list）があるとします。次のようなことができます：

filtered_Word_list = Word_list[:] #make a copy of the Word_list for Word in Word_list: # iterate over Word_list if Word in stopwords.words('english'): filtered_Word_list.remove(Word) # remove Word from filtered_Word_list if it is a stopword

sumitjainjr · Answer

Nltkストップワードを含むすべてのタイプのストップワードを除外するには、次のようにします。

from stop_words import get_stop_words from nltk.corpus import stopwords stop_words = list(get_stop_words('en')) #About 900 stopwords nltk_words = list(stopwords.words('english')) #About 150 stopwords stop_words.extend(nltk_words) output = [w for w in Word_list if not w in stop_words]

Yugant Hadiyal · Answer

textcleanerライブラリを使用して、データからストップワードを削除します。

このリンクに従ってください： https://yugantm.github.io/textcleaner/documentation.html#remove_stpwrds

このライブラリを使用するには、次の手順に従ってください。

pip install textcleaner

インストール後：

import textcleaner as tc data = tc.document(<file_name>) #you can also pass list of sentences to the document class constructor. data.remove_stpwrds() #inplace is set to False by default

上記のコードを使用して、ストップワードを削除します。

Mohammed_Ashour · Answer

この機能を使用できます。すべての単語を下げる必要があることに注意してください。

from nltk.corpus import stopwords def remove_stopwords(Word_list): processed_Word_list = [] for Word in Word_list: Word = Word.lower() # in case they arenet all lower cased if Word not in stopwords.words("english"): processed_Word_list.append(Word) return processed_Word_list

Saeid BK · Answer

filter を使用：

from nltk.corpus import stopwords # ... filtered_words = list(filter(lambda Word: Word not in stopwords.words('english'), Word_list))

Muhammad Yusuf · Answer

 import sys print ("enter the string from which you want to remove list of stop words") userstring = input().split(" ") list =["a","an","the","in"] another_list = [] for x in userstring: if x not in list: # comparing from the list and removing it another_list.append(x) # it is also possible to use .remove for x in another_list: print(x,end=' ') # 2) if you want to use .remove more preferred code import sys print ("enter the string from which you want to remove list of stop words") userstring = input().split(" ") list =["a","an","the","in"] another_list = [] for x in userstring: if x in list: userstring.remove(x) for x in userstring: print(x,end = ' ') #the code will be like this