LaunchToast

Programming A Driverless Car Chapter 6: Clustering

Clustering

Contents

Clustering Analysis-
Clustering algorithm helps us to cluster the objects based on any common feature and arrange them into different clusters.
The different aspects that it makes clusters can be anything depending on shape, color or size.

Fig: Various types of objects

 
For example, here we see different objects. The objects like apple and a box have same color. And therefore it can be clustered based on the color category. They can also be clustered on the basis of shapes. The algorithm takes up the patterns of the objects and cluster them based on the similar pattern.
How do we know how much clusters we will be needing?
It depends on the minimum threshold value. Once that value is achieved we won’t make more clusters.
 
What is clustering?
Clustering is the classification of objects into different groups, or more precisely, the partitioning of data set into subsets (Clusters) so that the data in each subset (ideally) shares some common trait- often according to some defined distance measure.
Types of clustering are:

  1. Hierarchical Clustering-

    A. Agglomerative (bottom-up)
    -Computes all pair-wise pattern-pattern similarity coefficients. Ie. the difference between objects within cluster is minimum.
    -Place each of n patterns into a class of its own.
    -Merge the two most similar clusters into one – Replaces the two clusters into the new cluster and re-computes inter-cluster similarity scores w.r.t. The new cluster.
    -Repeat the above step until there are k clusters left.(k can be 1)
    -Each data is assigned as an individual cluster. K is having the maximum value and from there we go to a minimum value for k. Minimum value of k can be minimum distance between a cluster.

    Agglomerative Clustering (Bottom Up)-
    The nearest 2 points are merged to form a cluster.
    In this we combine the two closest points to form the first cluster. Similarly, we keep on forming clusters until we reach to a k point where all clusters are done.
    So, when you get merge the similar and close points, you can make clusters completely.
    Finally these k clusters are left.

    B. Divisive Approach (top-down)
    – Start at the top with all the patterns in one cluster
    – The cluster is split using a flat clustering algorithm
    – This procedure is applied recursively until each pattern is in its own singleton cluster.

    We have k=2, then we split it to k=4, and then k=6. From k=1 to k= maximum. We apply hierarchical approach only for small data sets and not for large data sets.

 

  1. K means clustering
    – Each cluster is associated with a centroid.
    – Each data object is assigned to closet centroid.
    – The centroid of each cluster is then updated based on the data objects assignment to the cluster.
    – Repeat the assignment and update steps until convergence.
  2. Fuzzy C-Clustering:
    – An extension of k-means.
    – Hierarchical, k means generates partitions. Each data object is assigned to closest centroid.
    – Fuzzy-c means allow data points to be assigned into more than one cluster. Each data point has a degree of membership of belonging to each other.
    – It is frequently used in pattern recognition. It is based on minimization of the following object function.

    Fuzzy c means algorithm:
    1. Let x(i) be a vector of values for data point g(i). Initialize membership U(0)= [u(ij)] for data point g(i) of cluster cl(j) by random. 
    2. At the kth step, compute the fuzzy centroid C(k) =[c(j)] for j=1,….,nc, where nc is the number of clusters, using 

        where m is the fuzzy parameter and n is the number of data points.
        3. Update the fuzzy membership U(k) = [u(ij)] , using
                                     

  1. If ||U(k) – U(k-1)|| < , then STOP, else return to step2.
           5. Determine membership cutoff – For each data point g assign g(i) to cluster cl(j) if u(ij) of               U(k)>

    Application of Clustering-

Multiple Clusters-
– Clustering
– K means Clustering Algorithm
– Example
– How to choose value of k?
– Applications of K means algorithm

We know that we want to try and partition data points  into the set of clusters based on the feature space. The data points that retain the closest point are merged into a cluster.

K means Clustering technique-
1. Choose k initial centroids- Suppose k=3.
2. Each cluster is associated with a centroid

  1. Each data object is assigned to closest centroid.
  2. The centroid of each cluster is then updated based on the data objects assignment to the cluster.
  3. Repeat the assignment and update steps until convergence.


For example : K=3.
Step1: Make random assignments and compute centroids (big dots)

Step2: Assign points to the nearest centroids.

Step3: Recompute centroids (in this example, solution is now stable)
K means setup-

The method works:

Step1 : Initialization-


Step2: Assigning elements to any of the cluster-

Step3: Assigning elements to new clusters according to distance:

Step4: Convergence

How to choose value of k?

  1. How do we choose K depends on the distance between elements of a particular cluster and centroid of the same minimizes. This is one more clustering method called EM.
  2. Run algorithm on data with several different values of k.
  3. Use the prior knowledge about the characteristics of the problem.

Applications of K means clustering-

  1. It is relatively faster and efficient. It computes result at 0(t n), where n is the number of objects or points, k is the number of clusters and t is the number of iterations.
  2. K-means clustering can be applied to machine learning or data mining.
  3. Used on acoustic data in speech understanding to convert waveforms into one of k categories (known as Vector Quantization for Image Segmentation)
  4. Also used for choosing color palettes on old fashioned graphical display devices and Image Quantization.

    This means k-means algorithm is useful for undirected knowledge discovery and is relatively simple. K-means has found widespread usage in lot of fields, ranging from unsupervised learning of neural networks, pattern recognitions, Classification analysis, Artificial Intelligence, image processing, machine vision and many others.

 
PCA Algorithm:

Applications in Self Driving-

For getting your hands on Image processing and real time Image Acquistion using OpenCV:

  1. First you need to install Open CV if it is not already installed. Open your command prompt, and then just type this:
  2. Once it is installed, we can simply check it by typing
    >>> import cv2
    (if it shows no error that means it is installed)
    >>> import numpy
    >>> import matplotlib (these 2 libraries are needed)
    If it is done, we can start with the programming:
    import cv2
    import numpy
    #reading image from the computer
    img=cv2.imread(r’C:\images\image2.png,1) ----- If you want to see this window in grayscale mode then just change 1 to 0.
    cv2.imshow(‘Image’,img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    ——————————————————————————–
    import cv2
    import numpy
    #reading image from the computer
    img=cv2.imread(r’C:\images\image2.png,1)

    #if you need to design your own window, then change the frame dimensions

    cv2.namedWindow(‘Image’,cv2.WINDOW_NORMAl)
    cv2.imshow(‘Image’,img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    ---------------------------------------------------------------------------------------------
    import cv2
    import numpy
    #reading image from the computer
    img=cv2.imread(r’C:\images\image2.png,1)
    cv2.namedWindow(‘Image’,cv2.WINDOW_NORMAl)
    #if you need the image output
    cv2.imwrite(‘C:\images\output.jpg’,img)
    cv2.imshow(‘Image’,img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    —————————————————————————————
    If we want to work with real time images:
    cap=cv2.VideoCapture(0)
    while TRUE:
    ret,frame=cap.read()
    cv2.imshow(‘frame’, frame)
    if cv2.waitKey(1) && 0*FF==ord(‘q’):
    break
    cap.release()
    cv2.destroyAllWindows()
    ----------------------------------------------------------------------------------
    # For Drawing anything on the image:
    import cv2
    import numpy
    img=cv2.imread(r’C:\images\image2.png,1)
    cv2.namedWindow(‘Image’,cv2.WINDOW_NORMAl)
    #add a line to the image
    img=cv2.line(img,(10,120), (400, 500),(255,0,0),5)--- first the source coordinates, then the destination coordinates, color and the width.
    #add a rectangle to the image
    img=cv2.rectangle(img,(50,0),(200,180),(0,0,255),3)
    cv2.imshow(‘Image’,img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    ————————————————————————————
    For edge detection:

    import cv2
    import numpy
    import matplotlib.pyplot as plt
    img=cv2.imread(r’C:\images\image4.jpg’,0)
    edges=cv2.Canny(img,100,200)
    plt.subplot(121),plt.imshow(img,cmpa=’gray’)
    plt.title(‘original Image’),plt.xticks([]).plt.yticks([])
    plt.subplot(122),plt.imshow(edges,cmap=’gray’)
    plt.title(‘Edge Image’),plt.xticks([]),plt.xticks([])
    plt.show()

    ————————————————————————————
    For corner Detection:
    We will use Harry’s corner algorithm-
    import cv2
    import numpy

    img=cv2.imread(r’C:\images\chessboard.png)
    gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    gray=numpy.float32(gray)
    dst=cv2.cornerHarris(gray,2,3,0.04)
    dst=cv2.dilate(dst,NONE)
    #threshold for a particular value
    img[dst>0.01*dst.max()]=[0,0,255]
    cv2.imshow(‘dst’,img)
    cv2.waitkey(0)
    cv2.destroyAllWindows()

    This will help in image detection, object detection, feature detection using opencv and python!
  3. Color Quantization using K means Clustering:
    We will use the similar syntax and k means clustering for color quantization. This can be defined as reducing the number of colors in an image to reduce the memory. For color quantization we will use a command- cv.kmeans() – input parameters.
    A. samples: It should be of np.float32 data type, and each feature should be put in a single column.
    B. nclusters(K): number of clusters required at the end
    C. criteria: It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are (type, max_iter, epsilon): Termination criteria has 3 flags-
    cv2.TERM_CRITERIA_EPS- Stop the algorithm iteration if specified accuracy, epsilon, is reached.
    cv2.TERM_CRITERIA_MAX_ITER- stops the algorithm after the specified number of iterations, max_iter.
    cv2.TERM_CRITERIA_EPS+TERM_CRITERIA_MAX_ITER- stop the iteration when any of the above condition is met.
    max_iter- An integer specifying maximum number of iterations
    epsilon- Required Accuracy

    D. attempts: Flag to specify the number of times the algorithm is executed using different initial labellings.
    E.flags: This flag is used to specify how initial centers are taken. Normally two flags are used for this: cv2.KMEANS_PP_CENTERS and cv2.KMEANS_RANDOM_CENTERS
    cv.kmeans()-Output Parameters
    A. Compactness- It is the sum of squared distance from each point to their corresponding centers.
    B. labels: This is the label array, where each element marked ‘0’,’1’…
    C. centers : This is the array of centers of clusters.

    import numpy
    import cv2
    img=cv2.imread(r’C:\images\img1.jpg’)
    img2=img.reshape((-1,3)
    #converting to numpy.float32
    img2=numpy.float32(img2)
    #define the criteria
    criteria=(cv2.TERM_CRITERIA_EPS+cv2.TERM_CRITERIA_MAX_ITER,10,1.0)
    k1=8
    ret,label1,center1=cv2.kmeans(img2,k1,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
    center1=numpy.uint8(center1)
    res1=center1[label1.flatten]
    res1final=res1.reshape((img.shape))

    #define a window for image
    cv2.namedWindow(‘Kmeans Clustering’,cv2.WINDOW_NORMAl)
    final1=numpy.concatenate((img,res1final),axis=1)
    cv.imshow(‘Kmeans Clustering’,final1)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    This was clustering with k=8 and we can do with more cluster too
    import numpy
    import cv2
    img=cv2.imread(r’C:\images\img1.jpg’)
    img2=img.reshape((-1,3)

    #converting to numpy.float32
    img2=numpy.float32(img2)

    #define the criteria
    criteria=(cv2.TERM_CRITERIA_EPS+cv2.TERM_CRITERIA_MAX_ITER,10,1.0)
    k1=2
    ret,label1,center1=cv2.kmeans(img2,k1,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
    center1=numpy.uint8(center1)
    res1=center1[label1.flatten]
    res1final=res1.reshape((img.shape))
    #define a window for image
    cv2.namedWindow(‘Kmeans Clustering’,cv2.WINDOW_NORMAl)
    final1=numpy.concatenate((img,res1final),axis=1)
    cv.imshow(‘Kmeans Clustering’,final1)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


    FACE DETECTION AND OBJECT DETECTION USING OPEN CV
    import numpy
    import cv2
    fc=cv2.CascadeClassifier(r‘C:\images\haarcascade_frontalface_default.xml’)
    ec=cv2.CascadeClassifier(r‘C:\images\haarcascade_eye.xml’)
    img=cv2.imread(r‘C:\images\man.jpeg’)
    gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    faces=fc.detectMultiScale(gray,1.3,5)
    for(x,y,w,h) in faces:
    img=cv2.rectangle(img,(x,y),(x+w),(y+h),(255,0,0),2)
    roi_gray=gray[y:y+h,x:x+w]
    roi_color=img[y:y+h,x:x+w]
    eyes=ec.detectMultiScale(roi_Gray)
    for(ex,ey,eh) in eyes:
    cv2.rectangle(roi_color, (ex,ey),(ex+ew,ey+eh),(0,0,255),2)
    cv.imshow(‘Face detection’,img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

 
Recommended Reading: Chapter 5: Linear Regression
Next: Programming A Driverless Car Chapter 7: Natural Language Processing
To start reading from a topic of your choice, you can go back to the Table of Contents here
This course evolves with your active feedback. Do let us know your feedback in the comments section below.
Looking for jobs in Artificial Intelligence? Check here

Exit mobile version