/blogs/images/7.jpg

Useful Notes

While the group-sparsity constraint forces our model to only consider a few features, these features are largely used across all tasks. All of the previous approaches thus assume that the tasks used in multi-task learning are closely related. However, each task might not be closely related to all of the available tasks. In those cases, sharing information with an unrelated task might actually hurt performance, a phenomenon known as negative transfer.

L2 Loss and Gaussian Distribution

Minimizing L2 loss comes from the assumption that the data is drawn from Gaussian distribution The sentence “Minimizing L2 loss comes from the assumption that the data is drawn from Gaussian distribution” is taken from one paper Deep multi-scale video prediction beyond mean square error . The following is my understanding based on some useful resources. Suppose we have a dataset $\mathcal{X}={\mathbf{x}^{(1)},\mathbf{x}^{(2)}, \cdots, \mathbf{x}^{(m)}}$. The data comes from an unknown distribution $p_{data}(\mathbf{x})$ and we use $p_{model}(\mathbf{x}; \theta)$, which is parametrized by $\theta$, to model the unknown data distribution $p_{data}(x)$.

About Big Data

Top Journals and Conferences 通过上网查询以及看同行对会议的公共认识,数据挖掘领域的顶级会议是KDD(ACM SIGKDD Conference on Knowledge Discovery and Data Mining),公认的、排名前几位的会议