Machine Learning – 24 – Multi-class classification

Multi-class classification

0 மற்றும் 1 என இரு பிரிவுகள் மட்டும் இல்லாமல், பல்வேறு பிரிவுகள் இருப்பின், புதிதாக வரும் ஒன்றினை எந்த பிரிவின் கீழ் அமைக்க வேண்டும் என கணிப்பதே multi-class classification ஆகும். இதில் எத்தனை பிரிவுகள் இருக்கிறதோ, அத்தனை logistic கணிப்புகள் நடைபெறும். பின்னர் புதிதாக வருகின்ற ஒன்று, அனைத்தினாலும் கணிக்கப்பட்டு , எதில் அதிகமாகப் பொருந்துகிறதோ, அந்தப் பிரிவைச் சென்றடையும்.

கீழ்க்கண்ட உதாரணத்தில் சிகப்பு, ஊதா, பச்சை, மஞ்சள் எனும் நான்கு பிரிவுகளில் வளையங்கள் உள்ளன.

முதலில் சிகப்பினைக் கணிப்பதற்கான hypothesis உருவாக்கப்படும். இதில் h(x) = 1 என்பது சிகப்பினைக் குறிக்கும். சிகப்பு அல்லாத அனைத்தும் 0 –ஆல் குறிக்கப்படும்.

அடுத்து ஊதாவைக் கணிப்பதற்கான hypothesis உருவாக்கப்படும். இதில் h(x) = 1 என்பது ஊதாவைக் குறிக்கும். ஊதா அல்லாத அனைத்தும் 0 –ஆல் குறிக்கப்படும்.

இவ்வாறாக அடுத்தடுத்த நிறங்களுக்கு hypothesis உருவாக்கப்படும்.

பின்னர், புதிதாக ஒரு வளையம் வருகிறதெனில் அது சிகப்பாக கணிக்கப்படுவதற்கான சாத்தியம் 30%, ஊதாவாக கணிக்கப்படுவதற்கான சாத்தியம் 40%, பச்சையாக கணிக்கப்படுவதற்கான சாத்தியம் 60% மஞ்சளாக கணிக்கப்படுவதற்கான சாத்தியம் 50% என வருகிறததேனில் தெ , எதன் சாத்தியம் அதிகமாக இருக்கிறதோ, அந்தப் பிரிவின் கீழ் அமையும். இதுவே multi-class classification ஆகும்.

Decision tree, gaussian NB, KNN, SVC ஆகியவை இதுபோன்ற multi class -க்கு துணைபுரியும் algorithmns ஆகும். இவை பின்வருமாறு.

	from sklearn.metrics import confusion_matrix
	from sklearn.metrics import precision_recall_fscore_support
	import pandas as pd
	from sklearn.model_selection import train_test_split
	from sklearn.tree import DecisionTreeClassifier
	from sklearn.svm import SVC
	from sklearn.neighbors import KNeighborsClassifier
	from sklearn.naive_bayes import GaussianNB

	df = pd.read_csv('./flowers.csv')
	X = df[list(df.columns)[:-1]]
	y = df['Flower']
	X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

	tree = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train)
	tree_predictions = tree.predict(X_test)
	print (tree.score(X_test, y_test))
	print (confusion_matrix(y_test, tree_predictions))
	print (precision_recall_fscore_support(y_test, tree_predictions))

	svc = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
	svc_predictions = svc.predict(X_test)
	print (svc.score(X_test, y_test))
	print (confusion_matrix(y_test, svc_predictions))
	print (precision_recall_fscore_support(y_test, svc_predictions))

	knn = KNeighborsClassifier(n_neighbors = 7).fit(X_train, y_train)
	knn_predictions = knn.predict(X_test)
	print (knn.score(X_test, y_test))
	print (confusion_matrix(y_test, knn_predictions))
	print (precision_recall_fscore_support(y_test, knn_predictions))

	gnb = GaussianNB().fit(X_train, y_train)
	gnb_predictions = gnb.predict(X_test)
	print (gnb.score(X_test, y_test))
	print (confusion_matrix(y_test, gnb_predictions))
	print (precision_recall_fscore_support(y_test, gnb_predictions))

view raw

multi_class_classification.py

hosted with ❤ by GitHub

வெளியீடு:

0.8947368421052632
[[15 1 0]
[ 3 6 0]
[ 0 0 13]]
(array([0.83333333, 0.85714286, 1. ]), array([0.9375, 0.66666667, 1. ]), array([0.88235294, 0.75, 1. ]), array([16, 9, 13], dtype=int64))

0.9736842105263158
[[15 1 0]
[ 0 9 0]
[ 0 0 13]]
(array([1. , 0.9, 1. ]), array([0.9375, 1. , 1. ]), array([0.96774194, 0.94736842, 1. ]), array([16, 9, 13], dtype=int64))

1.0
[[16 0 0]
[ 0 9 0]
[ 0 0 13]]
(array([1., 1., 1.]), array([1., 1., 1.]), array([1., 1., 1.]), array([16, 9, 13], dtype=int64))

அடுத்ததாக வாடிக்கையாளர் புகாரில் உள்ள வார்த்தைகளைக் கொண்டு, அந்தப் புகார் எந்த வகையின் கீழ் அமையும் என கணிக்கும் MultinomialNB algorithm பின்வருமாறு.

	import pandas as pd
	from io import StringIO
	import matplotlib.pyplot as plt
	from sklearn.feature_extraction.text import TfidfVectorizer
	from sklearn.feature_selection import chi2
	import numpy as np
	from sklearn.model_selection import train_test_split
	from sklearn.feature_extraction.text import CountVectorizer
	from sklearn.feature_extraction.text import TfidfTransformer
	from sklearn.naive_bayes import MultinomialNB

	df = pd.read_csv('./Consumer_Complaints.csv', sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
	df = df[pd.notnull(df['Issue'])]

	fig = plt.figure(figsize=(8,6))
	df.groupby('Product').Issue.count().plot.bar(ylim=0)
	plt.show()

	X_train, X_test, y_train, y_test = train_test_split(df['Issue'], df['Product'], random_state = 0)
	c = CountVectorizer()
	clf = MultinomialNB().fit (TfidfTransformer().fit_transform(c.fit_transform(X_train)), y_train)

	print(clf.predict(c.transform(["This company refuses to provide me verification and validation of debt per my right under the FDCPA. I do not believe this debt is mine."])))

	tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2), stop_words='english')
	features = tfidf.fit_transform(df.Issue).toarray()
	print (features)
	df['category_id'] = df['Product'].factorize()[0]
	pro_cat = df[['Product', 'category_id']].drop_duplicates().sort_values('category_id')
	print (pro_cat)
	for i, j in sorted(dict(pro_cat.values).items()):
	indices = np.argsort(chi2(features, df.category_id == j)[0])
	print (indices)
	feature_names = np.array(tfidf.get_feature_names())[indices]
	unigrams = [i for i in feature_names if len(i.split(' ')) == 1]
	bigrams = [i for i in feature_names if len(i.split(' ')) == 2]
	print(">",i)
	print("unigrams:",','.join(unigrams[:5]))
	print("bigrams:",','.join(bigrams[:5]))

view raw

Issues_classification.py

hosted with ❤ by GitHub

இதற்கு முதலில் ஒவ்வொரு product -ன் கீழும் எத்தனை புகார்கள் பயிற்சிக்குக் கொடுக்கப்பட்டுள்ளன என ஒரு வரைபடம் மூலம் வரைந்து பார்க்கப்படுகிறது.

பின்னர் அவை 70-30 எனும் விகிதத்தின் படி பயிற்சி கொடுக்கப்பட்டு சோதிக்கப்படுகிறது.

இதில் TfidfVectorizer மூலம் புகாரில் உள்ள தனித்தனி வார்த்தைகள் அனைத்தும் features -ஆக சேமிக்கப்படுகின்றன. பின்னர் chi2 மூலம் ஒவ்வொரு தனித்தனி category -யோடும் தொடர்பு கொண்டுள்ள வார்த்தைகளின் பட்டியல் சேமிக்கப்படுகிறது. பின்னர் அவை தனித்தனி வார்த்தையாக அமைந்தால் எந்த category -ன் கீழ் அமையும், இரண்டிரண்டாக அமைந்தால் எந்த category -ன் கீழ் அமையும் என்பது unigrams, bigrams எனும் பெயரில் சேமிக்கப்படுகின்றன.

Machine Learning – 24 – Multi-class classification

Like this:

Related

Leave a ReplyCancel reply

பகிர்ந்து கொள்க

Like this:

Related

Leave a ReplyCancel reply