Python代写|CSCI 5561: Assignment #3 Scene Recognition



2 Overview

Figure 1: You will design a visual recognition system to classify the scene categories.

The goal of this assignment is to build a set of visual recognition systems that classify scene categories. The scene classification dataset consists of 15 scene categories including office, kitchen, and forest as shown in Figure 1 [1]. The system will compute a set of image representations (tiny image and bag-of-word visual vocabulary) and predict the category of each testing image using the classifiers (k-nearest neighbor and SVM) built on the training data. A simple pseudo-code of the recognition system can found below:

Algorithm 1 Scene Recognition

1: Load training and testing images

2: Build image representation

3: Train a classifier using the representations of the training images

4: Classify the testing data.

5: Compute accuracy of testing data classification.

For the kNN classifier, step 3 and 4 can be combined.

3 Scene Classification Dataset

You can download the training and testing data from the homework 3 page on Canvas.

The data folder includes two text files (train.txt and test.txt) and two folders (train and test). Each row in the text file specifies the image and its label, i.e.,(label) (image path). The text files can be used to load images. In each folder, it includes 15 classes (Kitchen, Store, Bedroom, LivingRoom, Office, Industrial, Suburb,InsideCity, TallBuilding, Street, Highway, OpenCountry, Coast, Mountain, Forest) of scene images.

Note: the image paths inside train.txt and test.txt were recorded in Windows format (use \ instead of /). You may need to use function Path and PureWindowsPath imported from pathlib to deal with that if you use Linux or Mac. But do not worry about it since we have provided a function called extract_dataset_info, which can read information from those two txt files for you.

4 Tiny Image kNN Classification

(a) Image

(b) Tiny Image

Figure 2: You will use tiny image representation to get an image feature.

def get_tiny_image(img, output_size):

return feature

Input: img is an gray scale image, output_size=(w, h) is the size of the tiny image.

Output: feature is the tiny image representation by vectorizing the pixel intensity.

The resulting size will be w×h.

Description: You will simply resize each image to a small, fixed resolution (e.g.,16×16). You need to normalize the image by having zero mean and unit length. This is not a particularly good representation, because it discards all of the high frequency image content and is not especially invariant to spatial or brightness shifts.

def predict_kNN(feature_train, label_train, feature_test, k):

return label_test_pred

Input: feature_train is a ntr × d matrix where ntr is the number of training data samples and d is the dimension of image feature, e.g., 256 for 16×16 tiny image representation. Each row is the image feature. label_train[1, 15] is a ntr vector that specifies the label of the training data. feature_test is a nte × d matrix that contains the testing features where nte is the number of testing data samples. k is the number of neighbors for label prediction.

Output: label_test_pred is a nte vector that specifies the predicted label for the testing data.

Description: You will use a k-nearest neighbor classifier to predict the label of the testing data.

Figure 3: Confusion matrix for Tiny+kNN.

def classify_kNN_tiny(label_classes, label_train_list,

img_train_list, label_test_list, img_test_list):

return confusion, accuracy

Input: label_classes is a list of all kinds of classes, img_train_list and img_test_list are lists of paths to training and test images, label_train_list and label_test_list are corresponding lists of image scene labels.

Output: confusion is a 15 × 15 confusion matrix and accuracy is the accuracy of the testing data prediction.

Description: You will combine get_tiny_image and predict_kNN for scene classification. Your goal is to achieve accuracy >18%.

Note: We have provided a function called extract_dataset_info which takes in path to dataset directory and outputs label_classes, label_train_list, img_train_list,label_test_list, img_test_list for you (those will be the input arguments to function classify_kNN_bow and classify_svm_bow as well). To make your life and ours easier, please make sure you use that function.

5 Bag-of-word Visual Vocabulary

Figure 4: Each row represents a distinctive cluster from bag-of-word representation.

def compute_dsift(img, stride, size):

return dense_feature

Input: img is a gray scale image. stride and size are both integers controls locations on image to compute sift features and diameter of the meaningful keypoint neighborhood.

Output: dense_feature is a collection of sift features whose size is n×128. n is total number of locations to compute sift features on img.

Description: Given an image, instead of detecting key points and computing sift descriptor, this function directly compute sift descriptor on a dense set of locations on image. You can use sift related functions from opencv for computing sift descriptor for each location.

def build_visual_dictionary(dense_feature_list, d_size):

return vocab

Input: dense_feature_list is a list of dense sift feature representation of training images (each image is represented as a n x 128 array) and d_size is the size of the dictionary (the number of visual words). Function compute_dsift is provided to extract dense sift features from an image.

Output: vocab lists the quantized visual words whose size is d_size×128.

Description: Given a list of dense sift feature representation of training images,you will build a visual dictionary made of quantized SIFT features. You may start d_size=50. You can use KMeans function imported from sklearn.cluster. You may visualize the image patches to make sense the clustering as shown in Figure 4.

Algorithm 2 Visual Dictionary Building

1: For each image, compute dense SIFT over regular grid

2: Build a pool of SIFT features from all training images

3: Find cluster centers from the SIFT pool using kmeans algorithms.

4: Return the cluster centers.

Note: It takes more than half hour to build bag-of-word visual vocabulary, if you use default parameters of KMeans function (n_init=10,max_iter=300). You may want to play around with those parameter and use np.savetxt to save current vocab if you think it is good. Then you can use np.loadtxt to load that saved vocab in the future to save time.


本网站支持淘宝 支付宝 微信支付  paypal等等交易。如果不放心可以用淘宝交易!

E-mail:  微信:itcsdx