Skip to content

Commit 6da31c9

Browse files
author
bgalego
committed
New version 1.2.0 with:
- Adds improvements for the response of the Language Identification API - New classes for the management of the Request/Responses of APIs: - Deep Categorization - Text Clustering - Summarization Tests, the README and setup files and the examples have been updated to reflect these changes.
1 parent c8f6b56 commit 6da31c9

20 files changed

Lines changed: 850 additions & 75 deletions

README.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,23 +39,29 @@ And we are always available at support@meaningcloud.com
3939
This SDK currently contains the following:
4040

4141
- **Request**: manages requests to any of MeaningCloud's APIS. It can also be used to directly generate requests without using specific classes .
42-
- **LanguageRequest**: models a request to MeaningCloud Language Identification API.
43-
- **TopicsRequest**: models a request to MeaningCloud TopicsExtraction API.
4442
- **ClassRequest**: models a request to MeaningCloud Text Classification API.
45-
- **SentimentRequest**: models a request to MeaningCloud Sentiment Analysis API.
43+
- **ClusteringRequest**: models a request to MeaningCloud Text Clustering API.
44+
- **DeepCategorizationRequest**: models a request to MeaningCloud Deep Categorization API.
45+
- **LanguageRequest**: models a request to MeaningCloud Language Identification API.
4646
- **ParserRequest**: models a request to Meaningcloud Lemmatization, PoS and Parsing API.
47+
- **SentimentRequest**: models a request to MeaningCloud Sentiment Analysis API.
48+
- **SummarizationRequest**: models a request to Meaningcloud Summarization API.
49+
- **TopicsRequest**: models a request to MeaningCloud TopicsExtraction API.
4750
- **Response**: models a generic response from the MeaningCloud API.
48-
- **TopicsResponse**: models a response from the Topic Extraction API, providing auxiliary functions to work with the response, extracting the different types of topics and some of the most used fields in them.
4951
- **ClassResponse**: models a response from the Text Classification API, providing auxiliary functions to work with the response and extract the different fields in each category.
50-
- **SentimentResponse**: models a response from the Sentiment Analysis API, providing auxiliary functions to work with the response and extract the sentiment detected at different levels and for different elements.
52+
- **ClusteringResponse**: models a response from the Text Clustering API, providing auxiliary functions to work with the response and extract the different fields in each cluster.
53+
- **DeepCategorizationResponse**: models a response from the Deep Categorization API, providing auxiliary functions to work with the response and extract the different fields in each category.
5154
- **LanguageResponse**: models a response from the Language Identification API, providing auxiliary functions to work with the response and extract the sentiment detected at different levels and for different elements.
5255
- **ParserResponse**: models a response from the Lemmatization, PoS and Parsing API, providing auxiliary functions to work with the response and extract the lemmatization and PoS tagging of the text provided.
56+
- **SentimentResponse**: models a response from the Sentiment Analysis API, providing auxiliary functions to work with the response and extract the sentiment detected at different levels and for different elements.
57+
- **SummarizationResponse**: models a response from the Summarization API, providing auxiliary functions to work with the response and obtain the summary extracted.
58+
- **TopicsResponse**: models a response from the Topic Extraction API, providing auxiliary functions to work with the response, extracting the different types of topics and some of the most used fields in them.
5359

5460
### Usage
5561

5662
In the _example_ folder, there are two examples:
5763
- **Client.py**, which contains a simple example on how to use the SDK
58-
- **mc_showcase**, which implements a pipeline where plain text files are read from a folder, and two CSV files result as output: one with several types of analyses done over each text, and the results from running Text Clustering over the complete collection.
64+
- **mc_showcase**, which implements a pipeline where plain text files are read from a folder, and two CSV files result as output: one with several types of analyses done over each text, and the results from running [Text Clustering](https://www.meaningcloud.com/developer/text-clustering) over the complete collection.
5965
The analyses done are:
6066

6167
* [Language Identification](https://www.meaningcloud.com/developer/language-identification): detects the language and returns code or name

example/Client.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
model = 'IAB_en'
1111

1212
# @param license_key - Your license key (found in the subscription section in https://www.meaningcloud.com/developer/)
13-
license_key = '<your_license_key>'
13+
license_key = '<<<<< your license key >>>>>'
1414

1515
# @param text - Text to use for different API calls
1616
text = 'London is a very nice city but I also love Madrid.'
@@ -33,7 +33,7 @@
3333
topics_response.getTypeLastNode(topics_response.getOntoType(entity)) + "\n")
3434

3535
else:
36-
print("\nOh no! There was the following error: " + topics_response.getStatusMsg() + "\n")
36+
print("\tNo entities detected!\n")
3737
else:
3838
if topics_response.getResponse() is None:
3939
print("\nOh no! The request sent did not return a Json\n")
@@ -60,11 +60,12 @@
6060
# If there are no errors in the request, we will use the language detected to make a request to Sentiment and Topics
6161
if lang_response.isSuccessful():
6262
print("\nThe request to 'Language Identification' finished successfully!\n")
63-
64-
results = lang_response.getResults()
65-
if 'language_list' in results.keys() and results['language_list']:
66-
language = results['language_list'][0]['language']
67-
print("\tLanguage detected: " + results['language_list'][0]['name'] + ' (' + language + ")\n")
63+
languages = lang_response.getLanguages()
64+
if languages:
65+
language = lang_response.getLanguageCode(languages[0])
66+
print("\tLanguage detected: " + lang_response.getLanguageName(languages[0]) + ' (' + language + ")\n")
67+
else:
68+
print("\tNo language detected!\n")
6869

6970
# We are going to make a request to the Lemmatization, PoS and Parsing API
7071
parser_response = meaningcloud.ParserResponse(

example/mc_showcase.py

Lines changed: 18 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,13 @@ def getSentimentAnalysis(text):
4747
# Calls Language Detection and returns the code or name for the text
4848
def detectLanguage(text, get_name=False):
4949
language = ''
50-
# We are going to make a request to the Sentiment Analysis API
50+
# We are going to make a request to the Language Identification API
5151
print("\tDetecting language...")
5252
lang_response = meaningcloud.LanguageResponse(meaningcloud.LanguageRequest(license_key, txt=text).sendReq())
5353
if lang_response.isSuccessful():
54-
langs = lang_response.getResults()['language_list']
54+
langs = lang_response.getLanguages()
5555
if langs:
56-
language = langs[0]['language'] if not get_name else langs[0]['name']
56+
language = lang_response.getLanguageCode(langs[0]) if not get_name else lang_response.getLanguageName(langs[0])
5757
else:
5858
print("\tOops! Request to detect language was not succesful: (" + lang_response.getStatusCode() + ') ' + lang_response.getStatusMsg())
5959
return language
@@ -114,14 +114,10 @@ def getDeepCategorization(text, model, num_cats):
114114
# We are going to make a request to the Deep Categorization API
115115
formatted_categories = ''
116116
print("\tGetting " + model[0:len(model) - 3].replace('_', ' ') + " analysis...")
117-
deepcat = meaningcloud.Request(url="https://api.meaningcloud.com/deepcategorization-1.0", key=license_key)
118-
deepcat.addParam('model', model)
119-
deepcat.setContentTxt(text)
120-
deepcat_response = meaningcloud.Response(deepcat.sendRequest())
117+
deepcat_response = meaningcloud.DeepCategorizationResponse(meaningcloud.DeepCategorizationRequest(license_key, model=model, txt=text).sendReq())
121118
if deepcat_response.isSuccessful():
122-
cat_results = deepcat_response.getResults()
123-
categories = cat_results['category_list'] if (('category_list' in cat_results.keys()) and (cat_results['category_list'] is not None)) else {}
124-
formatted_categories = (', '.join(cat['label'] + ' (' + cat['relevance'] +')' for cat in categories[:num_cats])) if categories else '(none)'
119+
categories = deepcat_response.getCategories()
120+
formatted_categories = (', '.join(deepcat_response.getCategoryLabel(cat) + ' (' + deepcat_response.getCategoryRelevance(cat) +')' for cat in categories[:num_cats])) if categories else '(none)'
125121
else:
126122
print("\tOops! Request to Deep Categorization was not succesful: (" + deepcat_response.getStatusCode() + ') ' + deepcat_response.getStatusMsg())
127123

@@ -135,7 +131,7 @@ def getTextClassification(text, model, num_cats):
135131
class_response = meaningcloud.ClassResponse(meaningcloud.ClassRequest(license_key, txt=text, model=model, otherparams={'txtf': 'markup'}).sendReq())
136132
if class_response.isSuccessful():
137133
categories = class_response.getCategories()
138-
formatted_categories = (', '.join(class_response.getCategoryLabel(cat) + ' (' + class_response.getCategoryRelevance(cat) +')' for cat in categories[:num_cats])) if categories else '(none)'
134+
formatted_categories = (', '.join(class_response.getCategoryLabel(cat) + ' (' + class_response.getCategoryRelevance(cat) +')' for cat in categories[:num_cats])) if categories else '(none)'
139135
else:
140136
print("\tOops! The request to Text Classification was not succesful: (" + class_response.getStatusCode() + ') ' + class_response.getStatusMsg())
141137

@@ -144,15 +140,12 @@ def getTextClassification(text, model, num_cats):
144140

145141
# Calls Summarization and obtains an extractive summary with the number of sentences especified
146142
def getSummarization(text, sentences):
147-
# We are going to make a request to the Deep Categorization API
143+
# We are going to make a request to the Summarization API
148144
summary = ''
149145
print("\tGetting automatic summarization...")
150-
summarization = meaningcloud.Request(url="https://api.meaningcloud.com/summarization-1.0", key=license_key)
151-
summarization.addParam('sentences', sentences)
152-
summarization.setContentTxt(text)
153-
summarization_response = meaningcloud.Response(summarization.sendRequest())
146+
summarization_response = meaningcloud.SummarizationResponse(meaningcloud.SummarizationRequest(license_key, sentences=sentences, txt=text).sendReq())
154147
if summarization_response.isSuccessful():
155-
summary = summarization_response.getResults()['summary']
148+
summary = summarization_response.getSummary()
156149
else:
157150
print("\tOops! Request to Summarization was not succesful: (" + summarization_response.getStatusCode() + ') ' + summarization_response.getStatusMsg())
158151

@@ -164,22 +157,12 @@ def getClustering(text_collection):
164157

165158
# We are going to make a request to the Clustering API
166159
print("Getting clustering analysis...")
167-
clustering = meaningcloud.Request(url="https://api.meaningcloud.com/clustering-1.1", key=license_key)
168-
clustering.addParam('lang','en')
169-
clustering.addParam('mode','tm')
170-
texts = "\r\n".join(val.replace("\r", ' ').replace("\n", " ").replace("\f", " ") for val in text_collection.values())
171-
ids = "\r\n".join(text_collection.keys())
172-
clustering.setContentTxt(texts)
173-
clustering.addParam('id', ids)
174-
175-
clustering_response = meaningcloud.Response(clustering.sendRequest())
176-
160+
clustering_response = meaningcloud.ClusteringResponse(meaningcloud.ClusteringRequest(license_key, lang='en', texts=text_collection).sendReq())
177161
if clustering_response.isSuccessful():
178-
results = clustering_response.getResults()
179-
clusters = results['cluster_list'] if (('cluster_list' in results.keys()) and (results['cluster_list'] is not None)) else {}
180-
titles = [cl['title'] for cl in clusters]
181-
sizes = [cl['size'] for cl in clusters]
182-
scores = [float(cl['score']) for cl in clusters]
162+
clusters = clustering_response.getClusters()
163+
titles = [clustering_response.getClusterTitle(cl) for cl in clusters]
164+
sizes = [clustering_response.getClusterSize(cl) for cl in clusters]
165+
scores = [clustering_response.getClusterScore(cl) for cl in clusters]
183166
docs = [', '.join(cl['document_list'].keys()) for cl in clusters]
184167
return titles, sizes, scores, docs
185168
else:
@@ -244,7 +227,7 @@ def analyzeText(text, fibo=False):
244227
# read files
245228
input_files = {}
246229
for file_name in os.listdir('./' + input_folder):
247-
f = open(input_folder + '/' + file_name)
230+
f = open(input_folder + '/' + file_name, 'r', encoding='utf-8', errors='ignore')
248231
if f.mode == 'r':
249232
input_files[file_name] = f.read()
250233

@@ -259,12 +242,12 @@ def analyzeText(text, fibo=False):
259242
df[label_list] = df['Text'].apply(analyzeText, fibo=get_fibo)
260243
df.to_csv('./' + output_file + '.csv', index_label='File_name')
261244
print("Results printed to '"+ output_file + ".csv'!")
262-
#print(df)
245+
# print(df)
263246

264247

265248
# Cluster all files
266249
resulting_clusters = getClustering(input_files)
267250
df_clusters = pd.DataFrame( {'Cluster_Name': resulting_clusters[0], 'Size': resulting_clusters[1], 'Score': resulting_clusters[2], 'Documents': resulting_clusters[3]})
268251
df_clusters.to_csv('./' + output_file + '_clusters.csv', index_label='Cluster_ID')
269252
print("Clustering results printed to '"+ output_file + "_clusters.csv'!")
270-
#print(df_clusters)
253+
# print(df_clusters)

meaningcloud/ClusteringRequest.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import meaningcloud.Request
2+
3+
4+
class ClusteringRequest(meaningcloud.Request):
5+
6+
URL = 'https://api.meaningcloud.com/clustering-1.1'
7+
otherparams = None
8+
extraheaders = None
9+
type_ = ""
10+
11+
def __init__(self, key, lang, texts, mode='tm', otherparams=None, extraheaders=None):
12+
"""
13+
ClusteringRequest constructor
14+
15+
:param key:
16+
license key
17+
:param lang:
18+
language of the text
19+
:param txt:
20+
Collection of texts to cluster. Dictionary expected where the keys are the IDs of the text/doc
21+
:param mode:
22+
Clustering algorithm
23+
:param otherparams:
24+
Array where other params can be added to be used in the API call
25+
:param extraheaders:
26+
Array where other headers can be added to be used in the request
27+
"""
28+
29+
self._params = {}
30+
meaningcloud.Request.__init__(self, self.URL, key)
31+
self.otherarams = otherparams
32+
self.extraheaders = extraheaders
33+
self._url = self.URL
34+
35+
self.addParam('key', key)
36+
self.addParam('lang', lang)
37+
self.addParam('mode', mode)
38+
self.addParam('txt', "\r\n".join(val.replace("\r", ' ').replace("\n", " ").replace("\f", " ") for val in texts.values()))
39+
self.addParam('id', "\r\n".join(texts.keys()))
40+
41+
if (otherparams):
42+
for key in otherparams:
43+
self.addParam(key, otherparams[key])
44+
45+
def sendReq(self):
46+
return self.sendRequest(self.extraheaders)

meaningcloud/ClusteringResponse.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
import meaningcloud.Response
2+
3+
4+
class ClusteringResponse(meaningcloud.Response):
5+
6+
def __init__(self, response):
7+
"""
8+
ClusteringResponse constructor
9+
10+
:param response:
11+
String returned by the request
12+
"""
13+
14+
if not response:
15+
raise Exception("The request sent did not return a response")
16+
meaningcloud.Response.__init__(self, response)
17+
18+
def getClusters(self):
19+
"""
20+
Get clusters found for the texts sent
21+
22+
:return:
23+
Array with the categories detected
24+
"""
25+
26+
return (self._response['cluster_list']
27+
if (('cluster_list' in self._response.keys()) and (self._response['cluster_list'] is not None))
28+
else {})
29+
30+
# Generic auxiliary functions
31+
32+
def getClusterTitle(self, cluster):
33+
"""
34+
Get the title of a cluster
35+
36+
:param cluster:
37+
Cluster you want the title from
38+
:return:
39+
Cluster title
40+
"""
41+
42+
return (cluster['title']
43+
if ((len(cluster) > 0) and ('title' in cluster.keys()) and (cluster['title'] is not None))
44+
else "")
45+
46+
def getClusterSize(self, cluster):
47+
"""
48+
Get the size of a cluster
49+
50+
:param cluster:
51+
Cluster you want the size from
52+
:return:
53+
Cluster size
54+
"""
55+
56+
return (cluster['size']
57+
if ((len(cluster) > 0) and ('size' in cluster.keys()) and (cluster['size'] is not None))
58+
else "")
59+
60+
def getClusterScore(self, cluster):
61+
"""
62+
Get the score of a cluster
63+
64+
:param cluster:
65+
Cluster you want the score from
66+
:return:
67+
Cluster score
68+
"""
69+
70+
return (cluster['score']
71+
if ((len(cluster) > 0) and ('score' in cluster.keys()) and (cluster['score'] is not None))
72+
else "")
73+
74+
def getClusterDocuments(self, cluster):
75+
"""
76+
Get the list of documents in a cluster
77+
78+
:param cluster:
79+
Cluster you want the relevance from
80+
:return:
81+
Cluster relevance
82+
"""
83+
84+
return (self._response['document_list']
85+
if (('document_list' in self._response.keys()) and (self._response['document_list'] is not None))
86+
else {})
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import meaningcloud.Request
2+
3+
4+
class DeepCategorizationRequest(meaningcloud.Request):
5+
6+
URL = 'https://api.meaningcloud.com/deepcategorization-1.0'
7+
otherparams = None
8+
extraheaders = None
9+
type_ = ""
10+
11+
def __init__(self, key, model, txt=None, url=None, doc=None, polarity='n', otherparams=None, extraheaders=None):
12+
"""
13+
DeepCategorizationRequest constructor
14+
15+
:param key:
16+
license key
17+
:param txt:
18+
Text to use in the API calls
19+
:param url:
20+
Url to use in the API calls
21+
:param doc:
22+
File to use in the API calls
23+
:param model:
24+
Name of the model to use in the classification
25+
:param polarity:
26+
Determines if categories will contain an associated polarity value.
27+
:param otherparams:
28+
Array where other params can be added to be used in the API call
29+
:param extraheaders:
30+
Array where other headers can be added to be used in the request
31+
"""
32+
33+
self._params = {}
34+
meaningcloud.Request.__init__(self, self.URL, key)
35+
self.otherarams = otherparams
36+
self.extraheaders = extraheaders
37+
self._url = self.URL
38+
39+
self.addParam('key', key)
40+
self.addParam('model', model)
41+
self.addParam('polarity', polarity)
42+
43+
if txt:
44+
type_ = 'txt'
45+
elif doc:
46+
type_ = 'doc'
47+
elif url:
48+
type_ = 'url'
49+
else:
50+
type_ = 'default'
51+
52+
options = {'doc': lambda: self.setContentFile(doc),
53+
'url': lambda: self.setContentUrl(url),
54+
'txt': lambda: self.setContentTxt(txt),
55+
'default': lambda: self.setContentTxt(txt)
56+
}
57+
options[type_]()
58+
if (otherparams):
59+
for key in otherparams:
60+
self.addParam(key, otherparams[key])
61+
62+
def sendReq(self):
63+
return self.sendRequest(self.extraheaders)

0 commit comments

Comments
 (0)