Python编程语言学习:sklearn.manifold的TSNE函数的简介、使用方法、代码实现之详细攻略

简介: Python编程语言学习:sklearn.manifold的TSNE函数的简介、使用方法、代码实现之详细攻略

Manifold简介


Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

Manifold是一种非线性降维的方法。这个任务的算法是基于这样一种想法,即许多数据集的维数只是人为地偏高。

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data.

高维数据集很难可视化。虽然可以绘制二维或三维的数据来显示数据的固有结构,但等效的高维图就不那么直观了。为了帮助可视化数据集的结构,必须以某种方式减少维数。完成这种维数减少的最简单方法是对数据进行随机投影。尽管这允许一定程度的数据结构可视化,但选择的随机性仍有很多不足之处。在随机投影中,数据中更有趣的结构很可能会丢失。为了解决这一问题,设计了许多监督和非监督线性降维框架,如主成分分析(PCA),独立成分分析,线性判别分析,以及其他。这些算法定义了选择数据的“有趣的”线性投影的特定规则。这些方法可能很强大,但往往忽略了数据中重要的非线性结构。

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications. Manifold可以被认为是一种推广线性框架的尝试,如PCA,以敏感的非线性数据结构。虽然有监督变量存在,但典型的Manifold问题是非监督的:它从数据本身学习数据的高维结构,而不使用预定的分类。



TSNE简介—数据降维且可视化


     t-distributed Stochastic Neighbor Embedding(t-SNE),即t-分布随机邻居嵌入。t-SNE是一个可视化高维数据的工具。它将数据点之间的相似性转化为联合概率,并试图最小化低维嵌入和高维数据联合概率之间的Kullback-Leibler差异。t-SNE有一个非凸的代价函数,即通过不同的初始化,我们可以得到不同的结果。强烈建议使用另一种降维方法(如密集数据的PCA或稀疏数据的集群svd)来减少维数到一个合理的数量(如50),如果特征的数量非常高。这将抑制一些噪声,加快样本间成对距离的计算。


      t-SNE是目前来说效果最好的数据降维与可视化方法,但是它的缺点也很明显,比如:占内存大,运行时间长。但是,当我们想要对高维数据进行分类,又不清楚这个数据集有没有很好的可分性(即同类之间间隔小,异类之间间隔大),可以通过t-SNE投影到2维或者3维的空间中观察一下。如果在低维空间中具有可分性,则数据是可分的;如果在高维空间中不具有可分性,可能是数据不可分,也可能仅仅是因为不能投影到低维空间。t-SNE(TSNE)的原理是将数据点之间的相似度转换为概率。原始空间中的相似度由高斯联合概率表示,嵌入空间的相似度由“学生t分布”表示。


参考文章:https://www.deeplearn.me/2137.html



TSNE使用方法


from sklearn.manifold import TSNE

visual_model = TSNE(metric='precomputed', perplexity=10)  # t分布随机邻接嵌入

visual = visual_model.fit_transform(dis)



TSNE代码实现


class TSNE Found at: sklearn.manifold._t_sne

class TSNE(BaseEstimator):

   """t-distributed Stochastic Neighbor Embedding.

 

   t-SNE [1] is a tool to visualize high-dimensional data. It converts

   similarities between data points to joint probabilities and tries

   to minimize the Kullback-Leibler divergence between the joint

   probabilities of the low-dimensional embedding and the

   high-dimensional data. t-SNE has a cost function that is not convex,

   i.e. with different initializations we can get different results.

 

   It is highly recommended to use another dimensionality reduction

   method (e.g. PCA for dense data or TruncatedSVD for sparse data)

   to reduce the number of dimensions to a reasonable amount (e.g.

    50)

   if the number of features is very high. This will suppress some

   noise and speed up the computation of pairwise distances

    between

   samples. For more tips see Laurens van der Maaten's FAQ [2].

 

   Read more in the :ref:`User Guide <t_sne>`.

 

   Parameters

   ----------

   n_components : int, optional (default: 2)

   Dimension of the embedded space.

 

   perplexity : float, optional (default: 30)

   The perplexity is related to the number of nearest neighbors that

   is used in other manifold learning algorithms. Larger datasets

   usually require a larger perplexity. Consider selecting a value

   between 5 and 50. Different values can result in significanlty

   different results.

 

   early_exaggeration : float, optional (default: 12.0)

   Controls how tight natural clusters in the original space are in

   the embedded space and how much space will be between them.

    For

   larger values, the space between natural clusters will be larger

   in the embedded space. Again, the choice of this parameter is not

   very critical. If the cost function increases during initial

   optimization, the early exaggeration factor or the learning rate

   might be too high.

 

   learning_rate : float, optional (default: 200.0)

   The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If

   the learning rate is too high, the data may look like a 'ball' with any

   point approximately equidistant from its nearest neighbours. If the

   learning rate is too low, most points may look compressed in a

    dense

   cloud with few outliers. If the cost function gets stuck in a bad local

   minimum increasing the learning rate may help.

 

   n_iter : int, optional (default: 1000)

   Maximum number of iterations for the optimization. Should be at

   least 250.

 

   n_iter_without_progress : int, optional (default: 300)

   Maximum number of iterations without progress before we abort

    the

   optimization, used after 250 initial iterations with early

   exaggeration. Note that progress is only checked every 50

    iterations so

   this value is rounded to the next multiple of 50.

 

   .. versionadded:: 0.17

   parameter *n_iter_without_progress* to control stopping criteria.

 

   min_grad_norm : float, optional (default: 1e-7)

   If the gradient norm is below this threshold, the optimization will

   be stopped.

 

   metric : string or callable, optional

   The metric to use when calculating distance between instances in a

   feature array. If metric is a string, it must be one of the options

   allowed by scipy.spatial.distance.pdist for its metric parameter, or

   a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.

   If metric is "precomputed", X is assumed to be a distance matrix.

   Alternatively, if metric is a callable function, it is called on each

   pair of instances (rows) and the resulting value recorded. The

    callable

   should take two arrays from X as input and return a value indicating

   the distance between them. The default is "euclidean" which is

   interpreted as squared euclidean distance.

 

   init : string or numpy array, optional (default: "random")

   Initialization of embedding. Possible options are 'random', 'pca',

   and a numpy array of shape (n_samples, n_components).

   PCA initialization cannot be used with precomputed distances and

    is

   usually more globally stable than random initialization.

 

   verbose : int, optional (default: 0)

   Verbosity level.

 

   random_state : int, RandomState instance, default=None

   Determines the random number generator. Pass an int for

    reproducible

   results across multiple function calls. Note that different

   initializations might result in different local minima of the cost

   function. See :term: `Glossary <random_state>`.

 

   method : string (default: 'barnes_hut')

   By default the gradient calculation algorithm uses Barnes-Hut

   approximation running in O(NlogN) time. method='exact'

   will run on the slower, but exact, algorithm in O(N^2) time. The

   exact algorithm should be used when nearest-neighbor errors

    need

   to be better than 3%. However, the exact method cannot scale to

   millions of examples.

 

   .. versionadded:: 0.17

   Approximate optimization *method* via the Barnes-Hut.

 

   angle : float (default: 0.5)

   Only used if method='barnes_hut'

   This is the trade-off between speed and accuracy for Barnes-Hut T-

    SNE.

   'angle' is the angular size (referred to as theta in [3]) of a distant

   node as measured from a point. If this size is below 'angle' then it is

   used as a summary node of all points contained within it.

   This method is not very sensitive to changes in this parameter

   in the range of 0.2 - 0.8. Angle less than 0.2 has quickly increasing

   computation time and angle greater 0.8 has quickly increasing

    error.

 

   n_jobs : int or None, optional (default=None)

   The number of parallel jobs to run for neighbors search. This

    parameter

   has no impact when ``metric="precomputed"`` or

   (``metric="euclidean"`` and ``method="exact"``).

   ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.

   ``-1`` means using all processors. See :term:`Glossary <n_jobs>`

   for more details.

 

   .. versionadded:: 0.22

 

   Attributes

   ----------

   embedding_ : array-like, shape (n_samples, n_components)

   Stores the embedding vectors.

 

   kl_divergence_ : float

   Kullback-Leibler divergence after optimization.

 

   n_iter_ : int

   Number of iterations run.

 

   Examples

   --------

 

   >>> import numpy as np

   >>> from sklearn.manifold import TSNE

   >>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])

   >>> X_embedded = TSNE(n_components=2).fit_transform(X)

   >>> X_embedded.shape

   (4, 2)

 

   References

   ----------

 

   [1] van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-

    Dimensional Data

   Using t-SNE. Journal of Machine Learning Research 9:2579-2605,

    2008.

 

   [2] van der Maaten, L.J.P. t-Distributed Stochastic Neighbor

    Embedding

  https://lvdmaaten.github.io/tsne/

 

   [3] L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based

    Algorithms.

   Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

  https://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf

   """

   # Control the number of exploration iterations with

    early_exaggeration on

   _EXPLORATION_N_ITER = 250

   # Control the number of iterations between progress checks

   _N_ITER_CHECK = 50

   @_deprecate_positional_args

   def __init__(self, n_components=2, *, perplexity=30.0,

       early_exaggeration=12.0, learning_rate=200.0, n_iter=1000,

       n_iter_without_progress=300, min_grad_norm=1e-7,

       metric="euclidean", init="random", verbose=0,

       random_state=None, method='barnes_hut', angle=0.5,

       n_jobs=None):

       self.n_components = n_components

       self.perplexity = perplexity

       self.early_exaggeration = early_exaggeration

       self.learning_rate = learning_rate

       self.n_iter = n_iter

       self.n_iter_without_progress = n_iter_without_progress

       self.min_grad_norm = min_grad_norm

       self.metric = metric

       self.init = init

       self.verbose = verbose

       self.random_state = random_state

       self.method = method

       self.angle = angle

       self.n_jobs = n_jobs

 

   def _fit(self, X, skip_num_points=0):

       """Private function to fit the model using X as training data."""

       if self.method not in ['barnes_hut', 'exact']:

           raise ValueError("'method' must be 'barnes_hut' or 'exact'")

       if self.angle < 0.0 or self.angle > 1.0:

           raise ValueError("'angle' must be between 0.0 - 1.0")

       if self.method == 'barnes_hut':

           X = self._validate_data(X, accept_sparse=['csr'],

               ensure_min_samples=2,

               dtype=[np.float32, np.float64])

       else:

           X = self._validate_data(X, accept_sparse=['csr', 'csc', 'coo'],

               dtype=[np.float32, np.float64])

       if self.metric == "precomputed":

           if isinstance(self.init, str) and self.init == 'pca':

               raise ValueError("The parameter init=\"pca\" cannot be "

                   "used with metric=\"precomputed\".")

           if X.shape[0] != X.shape[1]:

               raise ValueError("X should be a square distance matrix")

           check_non_negative(X, "TSNE.fit(). With

            metric='precomputed', X "

               "should contain positive distances.")

           if self.method == "exact" and issparse(X):

               raise TypeError('TSNE with method="exact" does not

                accept sparse '

                   'precomputed distance matrix. Use method="

                    barnes_hut" '

                   'or provide the dense distance matrix.')

       if self.method == 'barnes_hut' and self.n_components > 3:

           raise ValueError("'n_components' should be inferior to 4 for

            the "

               "barnes_hut algorithm as it relies on "

               "quad-tree or oct-tree.")

       random_state = check_random_state(self.random_state)

       if self.early_exaggeration < 1.0:

           raise ValueError(

               "early_exaggeration must be at least 1, but is {}".format(self.

                early_exaggeration))

       if self.n_iter < 250:

           raise ValueError("n_iter should be at least 250")

       n_samples = X.shape[0]

       neighbors_nn = None

       if self.method == "exact":

           # Retrieve the distance matrix, either using the precomputed

            one or

           # computing it.

           if self.metric == "precomputed":

               distances = X

           else:

               if self.verbose:

                   print("[t-SNE] Computing pairwise distances...")

               if self.metric == "euclidean":

                   distances = pairwise_distances(X, metric=self.metric,

                       squared=True)

               else:

                   distances = pairwise_distances(X, metric=self.metric,

                       n_jobs=self.n_jobs)

               if np.any(distances < 0):

                   raise ValueError("All distances should be positive, the "

                       "metric given is not correct")

           # compute the joint probability distribution for the input

            space

           P = _joint_probabilities(distances, self.perplexity, self.verbose)

           assert np.all(np.isfinite(P)), "All probabilities should be finite"

           assert np.all(P >= 0), "All probabilities should be non-

            negative"

           assert np.all(P <= 1), ("All probabilities should be less "

               "or then equal to one")

       else:

           n_neighbors = min(n_samples - 1, int(3. * self.perplexity + 1))

           if self.verbose:

               print("[t-SNE] Computing {} nearest neighbors...".format

                (n_neighbors))

           # Find the nearest neighbors for every point

           knn = NearestNeighbors(algorithm='auto', n_jobs=self.

            n_jobs,

               n_neighbors=n_neighbors,

               metric=self.metric)

           t0 = time()

           knn.fit(X)

           duration = time() - t0

           if self.verbose:

               print("[t-SNE] Indexed {} samples in {:.3f}s...".format

                (n_samples, duration))

           t0 = time()

           distances_nn = knn.kneighbors_graph(mode='distance')

           duration = time() - t0

           if self.verbose:

               print("[t-SNE] Computed neighbors for {} samples "

                   "in {:.3f}s...".

                   format(n_samples, duration)) # Free the memory used by

                    the ball_tree

           del knn

           if self.metric == "euclidean":

           # knn return the euclidean distance but we need it squared

           # to be consistent with the 'exact' method. Note that the

           # the method was derived using the euclidean method as in

            the

           # input space. Not sure of the implication of using a different

           # metric.

               distances_nn.data **= 2

           # compute the joint probability distribution for the input

            space

           P = _joint_probabilities_nn(distances_nn, self.perplexity, self.

            verbose) # Compute the number of nearest neighbors to find.

           # LvdM uses 3 * perplexity as the number of neighbors.

           # In the event that we have very small # of points

           # set the neighbors to n - 1.

       if isinstance(self.init, np.ndarray):

           X_embedded = self.init

       elif self.init == 'pca':

           pca = PCA(n_components=self.n_components,

            svd_solver='randomized', random_state=random_state)

           X_embedded = pca.fit_transform(X).astype(np.float32,

            copy=False)

       elif self.init == 'random': # The embedding is initialized with iid

        samples from Gaussians with

       # standard deviation 1e-4.

           X_embedded = 1e-4 * random_state.randn(n_samples, self.

            n_components).astype(np.float32)

       else:

           raise ValueError("'init' must be 'pca', 'random', or "

               "a numpy array")

       # Degrees of freedom of the Student's t-distribution. The

        suggestion

       # degrees_of_freedom = n_components - 1 comes from

       # "Learning a Parametric Embedding by Preserving Local

        Structure"

       # Laurens van der Maaten, 2009.

       degrees_of_freedom = max(self.n_components - 1, 1)

       return self._tsne(P, degrees_of_freedom, n_samples,

           X_embedded=X_embedded,

           neighbors=neighbors_nn,

           skip_num_points=skip_num_points)

 

   def _tsne(self, P, degrees_of_freedom, n_samples, X_embedded,

       neighbors=None, skip_num_points=0):

       """Runs t-SNE."""

       # t-SNE minimizes the Kullback-Leiber divergence of the

        Gaussians P

       # and the Student's t-distributions Q. The optimization

        algorithm that

       # we use is batch gradient descent with two stages:

       # * initial optimization with early exaggeration and momentum

        at 0.5

       # * final optimization with momentum at 0.8

       params = X_embedded.ravel()

       opt_args = {

           "it":0,

           "n_iter_check":self._N_ITER_CHECK,

           "min_grad_norm":self.min_grad_norm,

           "learning_rate":self.learning_rate,

           "verbose":self.verbose,

           "kwargs":dict(skip_num_points=skip_num_points),

           "args":[P, degrees_of_freedom, n_samples, self.

            n_components],

           "n_iter_without_progress":self._EXPLORATION_N_ITER,

           "n_iter":self._EXPLORATION_N_ITER,

           "momentum":0.5}

       if self.method == 'barnes_hut':

           obj_func = _kl_divergence_bh

           opt_args['kwargs']['angle'] = self.angle

           # Repeat verbose argument for _kl_divergence_bh

           opt_args['kwargs']['verbose'] = self.verbose # Get the number

            of threads for gradient computation here to

           # avoid recomputing it at each iteration.

           opt_args['kwargs']['num_threads'] =

            _openmp_effective_n_threads()

       else:

           obj_func = _kl_divergence

       # Learning schedule (part 1): do 250 iteration with lower

        momentum but

       # higher learning rate controlled via the early exaggeration

        parameter

       P *= self.early_exaggeration

       params, kl_divergence, it = _gradient_descent(obj_func, params,

        **opt_args)

       if self.verbose:

           print("[t-SNE] KL divergence after %d iterations with early "

               "exaggeration: %f" %

               (it + 1, kl_divergence)) # Learning schedule (part 2): disable

                early exaggeration and finish

       # optimization with a higher momentum at 0.8

       P /= self.early_exaggeration

       remaining = self.n_iter - self._EXPLORATION_N_ITER

       if it < self._EXPLORATION_N_ITER or remaining > 0:

           opt_args['n_iter'] = self.n_iter

           opt_args['it'] = it + 1

           opt_args['momentum'] = 0.8

           opt_args['n_iter_without_progress'] = self.

            n_iter_without_progress

           params, kl_divergence, it = _gradient_descent(obj_func,

            params, **opt_args)

       # Save the final number of iterations

       self.n_iter_ = it

       if self.verbose:

           print("[t-SNE] KL divergence after %d iterations: %f" % (it + 1,

            kl_divergence))

       X_embedded = params.reshape(n_samples, self.n_components)

       self.kl_divergence_ = kl_divergence

       return X_embedded

 

   def fit_transform(self, X, y=None):

       """Fit X into an embedded space and return that transformed

       output.

       Parameters

       ----------

       X : array, shape (n_samples, n_features) or (n_samples,

        n_samples)

           If the metric is 'precomputed' X must be a square distance

           matrix. Otherwise it contains a sample per row. If the method

           is 'exact', X may be a sparse matrix of type 'csr', 'csc'

           or 'coo'. If the method is 'barnes_hut' and the metric is

           'precomputed', X may be a precomputed sparse graph.

       y : Ignored

       Returns

       -------

       X_new : array, shape (n_samples, n_components)

           Embedding of the training data in low-dimensional space.

       """

       embedding = self._fit(X)

       self.embedding_ = embedding

       return self.embedding_

 

   def fit(self, X, y=None):

       """Fit X into an embedded space.

       Parameters

       ----------

       X : array, shape (n_samples, n_features) or (n_samples,

        n_samples)

           If the metric is 'precomputed' X must be a square distance

           matrix. Otherwise it contains a sample per row. If the method

           is 'exact', X may be a sparse matrix of type 'csr', 'csc'

           or 'coo'. If the method is 'barnes_hut' and the metric is

           'precomputed', X may be a precomputed sparse graph.

       y : Ignored

       """

       self.fit_transform(X)

       return self






相关文章
|
1月前
|
存储 Java 数据处理
(numpy)Python做数据处理必备框架!(一):认识numpy;从概念层面开始学习ndarray数组:形状、数组转置、数值范围、矩阵...
Numpy是什么? numpy是Python中科学计算的基础包。 它是一个Python库,提供多维数组对象、各种派生对象(例如掩码数组和矩阵)以及用于对数组进行快速操作的各种方法,包括数学、逻辑、形状操作、排序、选择、I/0 、离散傅里叶变换、基本线性代数、基本统计运算、随机模拟等等。 Numpy能做什么? numpy的部分功能如下: ndarray,一个具有矢量算术运算和复杂广播能力的快速且节省空间的多维数组 用于对整组数据进行快速运算的标准数学函数(无需编写循环)。 用于读写磁盘数据的工具以及用于操作内存映射文件的工具。 线性代数、随机数生成以及傅里叶变换功能。 用于集成由C、C++
294 0
|
1月前
|
存储 JavaScript Java
(Python基础)新时代语言!一起学习Python吧!(四):dict字典和set类型;切片类型、列表生成式;map和reduce迭代器;filter过滤函数、sorted排序函数;lambda函数
dict字典 Python内置了字典:dict的支持,dict全称dictionary,在其他语言中也称为map,使用键-值(key-value)存储,具有极快的查找速度。 我们可以通过声明JS对象一样的方式声明dict
163 1
|
1月前
|
算法 Java Docker
(Python基础)新时代语言!一起学习Python吧!(三):IF条件判断和match匹配;Python中的循环:for...in、while循环;循环操作关键字;Python函数使用方法
IF 条件判断 使用if语句,对条件进行判断 true则执行代码块缩进语句 false则不执行代码块缩进语句,如果有else 或 elif 则进入相应的规则中执行
249 1
|
1月前
|
测试技术 Python
Python装饰器:为你的代码施展“魔法”
Python装饰器:为你的代码施展“魔法”
235 100
|
1月前
|
开发者 Python
Python列表推导式:一行代码的艺术与力量
Python列表推导式:一行代码的艺术与力量
358 95
|
1月前
|
存储 Java 索引
(Python基础)新时代语言!一起学习Python吧!(二):字符编码由来;Python字符串、字符串格式化;list集合和tuple元组区别
字符编码 我们要清楚,计算机最开始的表达都是由二进制而来 我们要想通过二进制来表示我们熟知的字符看看以下的变化 例如: 1 的二进制编码为 0000 0001 我们通过A这个字符,让其在计算机内部存储(现如今,A 字符在地址通常表示为65) 现在拿A举例: 在计算机内部 A字符,它本身表示为 65这个数,在计算机底层会转为二进制码 也意味着A字符在底层表示为 1000001 通过这样的字符表示进行转换,逐步发展为拥有127个字符的编码存储到计算机中,这个编码表也被称为ASCII编码。 但随时代变迁,ASCII编码逐渐暴露短板,全球有上百种语言,光是ASCII编码并不能够满足需求
140 4
|
文字识别 算法 前端开发
100行Python代码实现一款高精度免费OCR工具
近期Github开源了一款基于Python开发、名为Textshot的截图工具,刚开源不到半个月已经500+Star。
100行Python代码实现一款高精度免费OCR工具
|
2月前
|
数据采集 机器学习/深度学习 人工智能
Python:现代编程的首选语言
Python:现代编程的首选语言
275 102
|
2月前
|
数据采集 机器学习/深度学习 算法框架/工具
Python:现代编程的瑞士军刀
Python:现代编程的瑞士军刀
300 104

推荐镜像

更多