• What is “filter” and “wrapper” method in feature selection?

• What’s the advantage and disadvantage of them?

• What are the common methods to impute missing data?

• What’s the problem associated with learning using imbalance datasets? What are the



• Explain the bottom-up and top-down approaches in hierarchical clustering.

• How to you construct dendrogram from a given data set (HW)?

• What is single-link, complete-link, average link, centroid link?

• How does k-means algorithm work?

• What is SSE in k-means?

• How to select the optimal k?

• What is a k-mediods method? What’s the advantage of it over k-means?

• How to measure clustering quality (internal – scatter criteria, external – precision, recall,

…etc )

Apriori Algorithm

• What is association rule mining? What are its applications?

• What is a strong association rule?

• How to calculate support, confidence of a given association rule?

• What is the apriori properity?

• How to generate C_i’s and L_i’s?

• How to evaluate a strong association is interesting or not: lift? What’s the implication of

lift >1, = 1, < 1?


• What is a sequential pattern?

• What is the apriori property in sequence mining?

• How to generate the C_i’s and L_i’s?

Time series

• What are the four components of a time series?

• What is an auto-regression model? How to construct lagged predictors?

• Why do we need DTW?

• What are the 3 properties a warping path must follow?

• How to find DTW give a cost matrix?

