APACHE SPARK KULLANILARAK BÜYÜK BOYUTLU GÖRÜNTÜLERİN ANALİZİ

DOLAPCI, BETÜL

DSpace Home
→
LİSANSÜSTÜ EĞİTİM ENSTİTÜSÜ
→
Lisansüstü Eğitim Enstitüsü Yüksek LisansTezleri
→
View Item

APACHE SPARK KULLANILARAK BÜYÜK BOYUTLU GÖRÜNTÜLERİN ANALİZİ

DOLAPCI, BETÜL

URI: http://acikerisim.karabuk.edu.tr:8080/xmlui/handle/123456789/866

Date: 2020-08-07

Abstract:

ÖZET Günümüzde yaşanan dijital dönüşüm süreci ve internetin küreselleşmesinden kaynaklı kolay erişilir olması, yüksek hacimlerde ve her türde (görüntü, ses, video, metin vb.) veri üretilebilmesine olanak sağlamıştır. Üretilen verinin boyutu, düzensizliği ve çeşitliliği gibi sebeplerden dolayı, veri üzerinde analiz yapılması ve anlam çıkarımı gittikçe zorlaşmaktadır. Görüntü verilerinin daha küçük parçalara bölünmesi ve bu parçalardan elde edilen ayırt edici ve bağımsız özelliklere sahip bir vektörle temsil edilmesi analiz işlemini kolaylaştırmaktadır. Bu nedenle, öncelikle görüntü verilerini küçük piksel bloklarına bölen bir blok bölümü yöntemi uygulanır. Büyük verinin boyut azaltımına gidilerek daha küçük boyutlarda ifade edilmesi özellik vektörü ile gerçekleştirilir. Bu çalışmada, veri tipi olarak kullandığımız görüntünün analizi için hibrit bir öznitelik vektörü oluşturulmuştur. Görüntülerin renk ve doku özelliklerinden çıkarılan alt özelliklerin bir arada kullanılması ile oluşturulan hibrit vektör, makine öğrenmesi yöntemleri ile görüntülerin sınıflandırılması amaçlı kullanılmıştır. Sınıflandırma işlemlerinde Apache Spark’ın MLlib kütüphanesi kullanılmıştır. Bu kütüphane içerisinde yer alan Naif Bayes, Karar Ağaçları ve Rastgele Orman yöntemleri kullanılarak Kaggle platformunda paylaşılan gemi görüntü verileri üzerinde deneysel çalışmalar gerçekleştirilmiştir. Bu tez çalışmasının amacı, gemi görüntüleri üzerinde öznitelik çıkarımı yöntemleri ile elde edilen hibrit vektör ile Apache Spark’ın MLlib kütüphanesi kullanılarak sınıflandırma yapmaktır. Deneysel çalışmaların sonuçları grafik ve çizelgeler ile sunularak detaylı bir şekilde analiz edilmiş ve tartışılmıştır. ABSTRACT Today's digital transformation process and easy access to the internet due to the globalization have enabled high-volume and all kinds of data (image, sound, video, text etc.) to be produced. Due to reasons such as the dimension, irregularity and diversity of the produced data, analysis and feature extraction on the data becomes more and more difficult. Dividing the image data into smaller blocks and representing them with a vector with distinctive and independent properties facilitates the analysis process. For this reason, a block division method is applied first, dividing the image data into small pixel blocks. Large data is reduced in dimension and expressed in smaller blocks is realized with the feature vector. In this study, a hybrid feature vector was created for the analysis of the image we use as the data type. The hybrid vector created by using the sub-features extracted from the color and texture features of the images was used for the classification of the images with machine learning methods. The MLlib library of Apache Spark was used for classification. Experimental studies were carried out on the ship images shared on the Kaggle platform using Naive Bayes, Decision Trees and Random Forest methods in this library. The purpose of this thesis is to classify the ship images with the hybrid vector obtained by the feature extraction methods using the MLlib library of Apache Spark. The results of the experimental studies are analyzed and discussed in detail by presenting them with graphs and tables.

Show full item record