Python:wordcloud.wordcloud()函数的参数解析及其说明

简介: Python:wordcloud.wordcloud()函数的参数解析及其说明

class WordCloud Found at: wordcloud.wordcloudclass WordCloud(object):

   """Word cloud object for generating and drawing.

 

   Parameters

   ----------

   font_path: string

   Font path to the font that will be used (OTF or TTF).

   Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path.

 

   width : int (default=400)

   Width of the canvas.

 

   height : int (default=200)

   Height of the canvas.

 

   prefer_horizontal : float (default=0.90)

   The ratio of times to try horizontal fitting as opposed to vertical.  If prefer_horizontal < 1, the algorithm will try rotating the word   if it doesn't fit. (There is currently no built-in way to get only vertical words.)

 

   mask : nd-array or None (default=None)

   If not None, gives a binary mask on where to draw words. If mask  is not  None, width and height will be ignored and the shape of mask  will be used instead. All white (#FF or #FFFFFF) entries will be considerd   "masked out" while other entries will be free to draw on. [This  changed in the most recent version!]

 

   scale : float (default=1)

   Scaling between computation and drawing. For large word-cloud   images,

   using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.

 

   min_font_size : int (default=4)

   Smallest font size to use. Will stop when there is no more room   in this  size.

 

   font_step : int (default=1)

   Step size for the font. font_step > 1 might speed up computation  but   give a worse fit.

 

   max_words : number (default=200)

   The maximum number of words.

 

   stopwords : set of strings or None

   The words that will be eliminated. If None, the build-in  STOPWORDS  list will be used.

 

   background_color : color value (default="black")

   Background color for the word cloud image.

 

   max_font_size : int or None (default=None)

   Maximum font size for the largest word. If None, height of the    image is used.

 

   mode : string (default="RGB")

   Transparent background will be generated when mode is "RGBA"  and  background_color is None.

 

   relative_scaling : float (default=.5)

   Importance of relative word frequencies for font-size.  With  relative_scaling=0, only word-ranks are considered.  With   relative_scaling=1, a word that is twice as frequent will have twice the size.  If you want to consider the word frequencies and not  only  their rank, relative_scaling around .5 often looks good.

 

   .. versionchanged: 2.0

   Default is now 0.5.

 

   color_func: callable, default=None

   Callable with parameters word, font_size, position, orientation,  font_path, random_state that returns a PIL color for each word.

   Overwrites "colormap". See colormap for specifying a matplotlib colormap instead.

 

   regexp : string or None (optional)

   Regular expression to split the input text into tokens in   process_text.

   If None is specified, ``r"\w[\w']+"`` is used.

 

   collocations : bool, default=True

   Whether to include collocations (bigrams) of two words.

 

   .. versionadded: 2.0

 

   colormap : string or matplotlib colormap, default="viridis"

   Matplotlib colormap to randomly draw colors from for each   word.

   Ignored if "color_func" is specified.

 

   .. versionadded: 2.0

 

   normalize_plurals : bool, default=True

   Whether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's'  is removed and its counts are added to the version without  trailing 's' -- unless the word ends with 'ss'.

 

类WordCloud在:WordCloud找到。wordcloudclass WordCloud(对象):

用于生成和绘制的Word云对象。


参数

----------

font_path:字符串

要使用的字体(OTF或TTF)的字体路径。

Linux机器上的默认DroidSansMono路径。如果你在另一个操作系统上或者没有这个字体,你需要调整这个路径。


width :int(默认=400)

画布的宽度。


height :int(默认=200)

画布的高度。


prefer_horizontal : float(默认=0.90)

尝试水平拟合与垂直拟合的时间比。如果prefer_horizontal < 1,算法将尝试旋转不适合的单词。(目前还没有内置的方法来只获取垂直的单词。)


mask : nd-array或None(默认=None)

如果没有,给出一个二进制掩码在哪里绘制单词。如果遮罩不是None,宽度和高度将被忽略,而使用遮罩的形状。所有白色(#FF或#FFFFFF)的参赛作品将被视为“屏蔽”,而其他参赛作品将可以自由提取。[这在最近的版本中有所改变!]


scale :浮动(默认=1)

在计算和绘图之间缩放。对于大的字云图像,

使用scale而不是更大的画布尺寸会快得多,但可能会导致适合文字的粗化。


min_font_size : int(默认=4)

使用的最小字体大小。将停止时,没有更多的空间在这个大小。


font_step : int(默认=1)

字体的步长。font_step > 1可能会加速计算,但是匹配效果更差。


max_words :数字(默认=200)

单词的最大数量。


stopwords :一组字符串或没有

将被删除的单词。如果没有,将使用内置的STOPWORDS列表。


background_color :颜色值(默认=“黑色”)

背景色为字云图像。


max_font_size : int或None(默认=None)

为最大的字的最大字体大小。如果没有,则使用图像的高度。


mode :string(默认="RGB")

当模式为“RGBA”,background_color为None时,将生成透明背景。


relative_scaling :浮动(默认= 5)

字体大小的相对频率的重要性。对于relative_scaling=0,只考虑单词的等级。使用relative_scaling=1,出现频率两倍的单词的大小也会增加一倍。如果您想要考虑单词的频率而不仅仅是它们的排名,那么在5左右的relative_scaling通常看起来不错。


. .versionchanged: 2.0

现在默认值是0.5。


color_func:可调用,默认=无

可调用参数word, font_size, position, orientation, font_path, random_state,为每个单词返回一个PIL颜色。

覆盖“colormap”。请参阅colormap以指定matplotlib的colormap。


regexp :字符串或无(可选)

正则表达式,用于在process_text中将输入文本分割为令牌。

如果没有指定,“r”\ w (\ w) +”“使用。

&

collocations :bool, default=True

是否包含两个单词的搭配(双字母组合)。


. .versionadded: 2.0


colormap : string或matplotlib colormap,默认="viridis"

Matplotlib colormap为每个单词随机绘制颜色。

如果指定了“color_func”,则忽略。


. .versionadded: 2.0


normalize_plurals : bool, default=True

是否删除单词后面的“s”。如果是真的,并且一个单词出现时带有或不带有结尾s,那么带有结尾s的单词将被删除,并将其计数添加到没有结尾s的版本中——除非这个单词以“ss”结尾。

   Attributes

   ----------

   ``words_`` : dict of string to float

   Word tokens with associated frequency.

 

   .. versionchanged: 2.0

   ``words_`` is now a dictionary

 

   ``layout_ `` : list of tuples (string, int, (int, int), int, color))

   Encodes the fitted word cloud. Encodes for each word the string,   font size, position, orientation and color.

 

   Notes

   -----

   Larger canvases with make the code significantly slower. If you   need a  large word cloud, try a lower canvas size, and set the scale  parameter.

 

   The algorithm might give more weight to the ranking of the words  than their actual frequencies, depending on the ``max_font_size `   and the scaling heuristic.

   """ 属性

---------

' ' words_ ' ':浮动字符串的dict

具有相关频率的单词标记。


. .versionchanged: 2.0

“words_”现在是一本字典


' ' layout_ ' ':元组列表(字符串,int, (int, int), int, color))

编码合适的词云。为每个单词编码字符串、字体大小、位置、方向和颜色。


笔记

-----

较大的画布使代码明显地变慢。如果你需要一个大的字云,尝试一个较低的画布大小,并设置比例参数。


根据' ' max_font_size '和缩放启发式,算法可能给予单词的排名比它们的实际频率更多的权重。

”“”

   def __init__(self, font_path=None, width=400, height=200,

    margin=2,

       ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,

       color_func=None, max_words=200, min_font_size=4,

       stopwords=None, random_state=None,

        background_color='black',

       max_font_size=None, font_step=1, mode="RGB",

       relative_scaling=.5, regexp=None, collocations=True,

       colormap=None, normalize_plurals=True):

       if font_path is None:

           font_path = FONT_PATH

       if color_func is None and colormap is None:

           # we need a color map

           import matplotlib

           version = matplotlib.__version__

           if version[0] < "2" and version[2] < "5":

               colormap = "hsv"

           else:

               colormap = "viridis"

       self.colormap = colormap

       self.collocations = collocations

       self.font_path = font_path

       self.width = width

       self.height = height

       self.margin = margin

       self.prefer_horizontal = prefer_horizontal

       self.mask = mask

       self.scale = scale

       self.color_func = color_func or colormap_color_func(colormap)

       self.max_words = max_words

       self.stopwords = stopwords if stopwords is not None else

        STOPWORDS

       self.min_font_size = min_font_size

       self.font_step = font_step

       self.regexp = regexp

       if isinstance(random_state, int):

           random_state = Random(random_state)

       self.random_state = random_state

       self.background_color = background_color

       self.max_font_size = max_font_size

       self.mode = mode

       if relative_scaling < 0 or relative_scaling > 1:

           raise ValueError(

               "relative_scaling needs to be "

               "between 0 and 1, got %f." %

               relative_scaling)

       self.relative_scaling = relative_scaling

       if ranks_only is not None:

           warnings.warn("ranks_only is deprecated and will be

            removed as"

               " it had no effect. Look into relative_scaling.",

               DeprecationWarning)

       self.normalize_plurals = normalize_plurals

 

   def fit_words(self, frequencies):

       """Create a word_cloud from words and frequencies.

       Alias to generate_from_frequencies.

       Parameters

       ----------

       frequencies : dict from string to float

           A contains words and associated frequency.

       Returns

       -------

       self

       """

       return self.generate_from_frequencies(frequencies)

 

   def generate_from_frequencies(self, frequencies,

    max_font_size=None):

       """Create a word_cloud from words and frequencies. Parameters

       ----------

       frequencies : dict from string to float

           A contains words and associated frequency.

       max_font_size : int

           Use this font-size instead of self.max_font_size

       Returns

       -------

       self

       """

       # make sure frequencies are sorted and normalized

       frequencies = sorted(frequencies.items(), key=itemgetter(1),

        reverse=True)

       if len(frequencies) <= 0:

           raise ValueError("We need at least 1 word to plot a word

            cloud, "

               "got %d." %

               len(frequencies))

       frequencies = frequencies[:self.max_words] # largest entry will

        be 1

       max_frequency = float(frequencies[0][1])

       frequencies = [(word, freq / max_frequency) for

           word, freq in frequencies]

       if self.random_state is not None:

           random_state = self.random_state

       else:

           random_state = Random()

       if self.mask is not None:

           mask = self.mask

           width = mask.shape[1]

           height = mask.shape[0]

           if mask.dtype.kind == 'f':

               warnings.warn("mask image should be unsigned byte

                between 0"

                   " and 255. Got a float array")

           if mask.ndim == 2:

               boolean_mask = mask == 255

           elif mask.ndim == 3: # if all channels are white, mask out

               :::3]255, axis=-1)

       else:

           boolean_mask = np.all(mask[ ==

               raise ValueError("Got mask of invalid shape: %s" %

                   str(mask.shape))

       else:

           boolean_mask = None

           height, width = self.height, self.width

       occupancy = IntegralOccupancyMap(height, width,

        boolean_mask)

       # create image

       img_grey = Image.new("L", (width, height))

       draw = ImageDraw.Draw(img_grey)

       img_array = np.asarray(img_grey)

       font_sizes, positions, orientations, colors = [], [], [], []

       last_freq = 1.

       if max_font_size is None:

           # if not provided use default font_size

           max_font_size = self.max_font_size

       if max_font_size is None:

           # figure out a good font size by trying to draw with

           # just the first two words

           if len(frequencies) == 1:

               # we only have one word. We make it big!

               font_size = self.height

           else:

               self.generate_from_frequencies(dict(frequencies[:2]),

                   max_font_size=self.height)

               # find font sizes

               sizes = [x[1] for x in self.layout_]

               try:

                   font_size = int(2 * sizes[0] * sizes[1] /

                       (sizes[0] + sizes[1]))

               # quick fix for if self.layout_ contains less than 2 values

               # on very small images it can be empty

               except IndexError:

                   try:

                       font_size = sizes[0]

                   except IndexError:

                       raise ValueError('canvas size is too small')

       else:

           font_size = max_font_size

       # we set self.words_ here because we called

        generate_from_frequencies

       # above... hurray for good design?

       self.words_ = dict(frequencies)

       # start drawing grey image

       for word, freq in frequencies:

           # select the font size

           rs = self.relative_scaling

           if rs != 0:

               font_size = int(round((rs * (freq / float(last_freq)) +

                           (1 - rs)) * font_size))

           if random_state.random() < self.prefer_horizontal:

               orientation = None

           else:

               orientation = Image.ROTATE_90

           tried_other_orientation = False

           while True:

               # try to find a position

               font = ImageFont.truetype(self.font_path, font_size)

               # transpose font optionally

               transposed_font = ImageFont.TransposedFont(

                   font, orientation=orientation)

               # get size of resulting text

               box_size = draw.textsize(word, font=transposed_font)

               # find possible places using integral image:

               result = occupancy.sample_position(box_size[1] + self.

                margin,

                   box_size[0] + self.margin,

                   random_state)

               if result is not None or font_size < self.min_font_size:

                   # either we found a place or font-size went too small

                   break

               # if we didn't find a place, make font smaller

               # but first try to rotate!

               if not tried_other_orientation and self.prefer_horizontal <

                1:

                   orientation = Image.ROTATE_90 if orientation is None

                    else Image.ROTATE_90

                   tried_other_orientation = True

               else:

                   font_size -= self.font_step

                   orientation = None

         

           if font_size < self.min_font_size:

               # we were unable to draw any more

               break

           x, y = np.array(result) + self.margin // 2

           # actually draw the text

           draw.text((y, x), word, fill="white", font=transposed_font)

           positions.append((x, y))

           orientations.append(orientation)

           font_sizes.append(font_size)

           colors.append(self.color_func(word, font_size=font_size,

                   position=(x, y),

                   orientation=orientation,

                   random_state=random_state,

                   font_path=self.font_path))

           # recompute integral image

           if self.mask is None:

               img_array = np.asarray(img_grey)

           else:

               img_array = np.asarray(img_grey) + boolean_mask

           # recompute bottom right

           # the order of the cumsum's is important for speed ?!

           occupancy.update(img_array, x, y)

           last_freq = freq

     

       self.layout_ = list(zip(frequencies, font_sizes, positions,

               orientations, colors))

       return self

 

   def process_text(self, text):

       """Splits a long text into words, eliminates the stopwords.

       Parameters

       ----------

       text : string

           The text to be processed.

       Returns

       -------

       words : dict (string, int)

           Word tokens with associated frequency.

       ..versionchanged:: 1.2.2

           Changed return type from list of tuples to dict.

       Notes

       -----

       There are better ways to do word tokenization, but I don't

        want to

       include all those things.

       """

       stopwords = set([i.lower() for i in self.stopwords])

       flags = re.UNICODE if sys.version < '3' and type(text) is unicode

        else 0

       regexp = self.regexp if self.regexp is not None else r"\w[\w']+"

       words = re.findall(regexp, text, flags)

       # remove stopwords

       words = [word for word in words if word.lower() not in

        stopwords]

       # remove 's

       words = [word[:-2] if word.lower().endswith("'s") else word for

           word in words]

       # remove numbers

       words = [word for word in words if not word.isdigit()]

       if self.collocations:

           word_counts = unigrams_and_bigrams(words, self.

            normalize_plurals)

       else:

           word_counts, _ = process_tokens(words, self.

            normalize_plurals)

       return word_counts

 

   def generate_from_text(self, text):

       """Generate wordcloud from text.

       The input "text" is expected to be a natural text. If you pass a

        sorted

       list of words, words will appear in your output twice. To

        remove this

       duplication, set ``collocations=False``.

       Calls process_text and generate_from_frequencies.

       ..versionchanged:: 1.2.2

           Argument of generate_from_frequencies() is not return of

           process_text() any more.

       Returns

       -------

       self

       """

       words = self.process_text(text)

       self.generate_from_frequencies(words)

       return self

 

   def generate(self, text):

       """Generate wordcloud from text.

       The input "text" is expected to be a natural text. If you pass a

        sorted

       list of words, words will appear in your output twice. To

        remove this

       duplication, set ``collocations=False``.

       Alias to generate_from_text.

       Calls process_text and generate_from_frequencies.

       Returns

       -------

       self

       """

       return self.generate_from_text(text)

 

   def _check_generated(self):

       """Check if ``layout_`` was computed, otherwise raise error."""

       if not hasattr(self, "layout_"):

           raise ValueError("WordCloud has not been calculated, call

            generate"

               " first.")

 

   def to_image(self):

       self._check_generated()

       if self.mask is not None:

           width = self.mask.shape[1]

           height = self.mask.shape[0]

       else:

           height, width = self.height, self.width

       img = Image.new(self.mode, (int(width * self.scale),

               int(height * self.scale)),

           self.background_color)

       draw = ImageDraw.Draw(img)

       for (word, count), font_size, position, orientation, color in self.

        layout_:

           font = ImageFont.truetype(self.font_path,

               int(font_size * self.scale))

           transposed_font = ImageFont.TransposedFont(

               font, orientation=orientation)

           pos = int(position[1] * self.scale), int(position[0] * self.scale)

           draw.text(pos, word, fill=color, font=transposed_font)

     

       return img

 

   def recolor(self, random_state=None, color_func=None,

    colormap=None):

       """Recolor existing layout.

       Applying a new coloring is much faster than generating the

        whole

       wordcloud.

       Parameters

       ----------

       random_state : RandomState, int, or None, default=None

           If not None, a fixed random state is used. If an int is given,

            this

           is used as seed for a random.Random state.

       color_func : function or None, default=None

           Function to generate new color from word count, font size,

            position

           and orientation.  If None, self.color_func is used.

       colormap : string or matplotlib colormap, default=None

           Use this colormap to generate new colors. Ignored if

            color_func

           is specified. If None, self.color_func (or self.color_map) is

            used.

       Returns

       -------

       self

       """

       if isinstance(random_state, int):

           random_state = Random(random_state)

       self._check_generated()

       if color_func is None:

           if colormap is None:

               color_func = self.color_func

           else:

               color_func = colormap_color_func(colormap)

       self.layout_ = [(word_freq, font_size, position, orientation,

               color_func(word=word_freq[0], font_size=font_size,

                   position=position, orientation=orientation,

                   random_state=random_state,

                   font_path=self.font_path)) for

           word_freq, font_size, position, orientation, _ in

           self.layout_]

       return self

 

   def to_file(self, filename):

       """Export to image file.

       Parameters

       ----------

       filename : string

           Location to write to.

       Returns

       -------

       self

       """

       img = self.to_image()

       img.save(filename, optimize=True)

       return self

 

   def to_array(self):

       """Convert to numpy array.

       Returns

       -------

       image : nd-array size (width, height, 3)

           Word cloud image as numpy matrix.

       """

       return np.array(self.to_image())

 

   def __array__(self):

       """Convert to numpy array.

       Returns

       -------

       image : nd-array size (width, height, 3)

           Word cloud image as numpy matrix.

       """

       return self.to_array()

 

   def to_html(self):

       raise NotImplementedError("FIXME!!!")


相关文章
|
2月前
|
XML JSON 数据处理
超越JSON:Python结构化数据处理模块全解析
本文深入解析Python中12个核心数据处理模块,涵盖csv、pandas、pickle、shelve、struct、configparser、xml、numpy、array、sqlite3和msgpack,覆盖表格处理、序列化、配置管理、科学计算等六大场景,结合真实案例与决策树,助你高效应对各类数据挑战。(238字)
226 0
|
2月前
|
数据采集 存储 JavaScript
解析Python爬虫中的Cookies和Session管理
Cookies与Session是Python爬虫中实现状态保持的核心。Cookies由服务器发送、客户端存储,用于标识用户;Session则通过唯一ID在服务端记录会话信息。二者协同实现登录模拟与数据持久化。
|
2月前
|
Java 数据处理 索引
(numpy)Python做数据处理必备框架!(二):ndarray切片的使用与运算;常见的ndarray函数:平方根、正余弦、自然对数、指数、幂等运算;统计函数:方差、均值、极差;比较函数...
ndarray切片 索引从0开始 索引/切片类型 描述/用法 基本索引 通过整数索引直接访问元素。 行/列切片 使用冒号:切片语法选择行或列的子集 连续切片 从起始索引到结束索引按步长切片 使用slice函数 通过slice(start,stop,strp)定义切片规则 布尔索引 通过布尔条件筛选满足条件的元素。支持逻辑运算符 &、|。
195 0
|
2月前
|
存储 JavaScript Java
(Python基础)新时代语言!一起学习Python吧!(四):dict字典和set类型;切片类型、列表生成式;map和reduce迭代器;filter过滤函数、sorted排序函数;lambda函数
dict字典 Python内置了字典:dict的支持,dict全称dictionary,在其他语言中也称为map,使用键-值(key-value)存储,具有极快的查找速度。 我们可以通过声明JS对象一样的方式声明dict
225 1
|
2月前
|
算法 Java Docker
(Python基础)新时代语言!一起学习Python吧!(三):IF条件判断和match匹配;Python中的循环:for...in、while循环;循环操作关键字;Python函数使用方法
IF 条件判断 使用if语句,对条件进行判断 true则执行代码块缩进语句 false则不执行代码块缩进语句,如果有else 或 elif 则进入相应的规则中执行
337 1
|
3月前
|
设计模式 缓存 监控
Python装饰器:优雅增强函数功能
Python装饰器:优雅增强函数功能
285 101
|
3月前
|
JSON 缓存 开发者
淘宝商品详情接口(item_get)企业级全解析:参数配置、签名机制与 Python 代码实战
本文详解淘宝开放平台taobao.item_get接口对接全流程,涵盖参数配置、MD5签名生成、Python企业级代码实现及高频问题排查,提供可落地的实战方案,助你高效稳定获取商品数据。
|
3月前
|
存储 大数据 Unix
Python生成器 vs 迭代器:从内存到代码的深度解析
在Python中,处理大数据或无限序列时,迭代器与生成器可避免内存溢出。迭代器通过`__iter__`和`__next__`手动实现,控制灵活;生成器用`yield`自动实现,代码简洁、内存高效。生成器适合大文件读取、惰性计算等场景,是性能优化的关键工具。
262 2
|
3月前
|
存储 缓存 测试技术
Python装饰器:优雅地增强函数功能
Python装饰器:优雅地增强函数功能
206 98
|
3月前
|
缓存 测试技术 Python
Python装饰器:优雅地增强函数功能
Python装饰器:优雅地增强函数功能
236 99

推荐镜像

更多