一、SQL查询
我们用wiki的数据为例
查询10条最多的页面编辑
SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10
我们在Query视图中操作
会有提示
选择Smart query limit会自动限制行数
Druid还提供了命令行查询sql 可以运行bin/dsql进行操作
Welcome to dsql, the command-line client for Druid SQL. Type "\h" for help. dsql>
提交sql
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; ┌──────────────────────────────────────────────────────────┬───────┐ │ page │ Edits │ ├──────────────────────────────────────────────────────────┼───────┤ │ Wikipedia:Vandalismusmeldung │ 33 │ │ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │ │ Jeremy Corbyn │ 27 │ │ Wikipedia:Administrators' noticeboard/Incidents │ 21 │ │ Flavia Pennetta │ 20 │ │ Total Drama Presents: The Ridonculous Race │ 18 │ │ User talk:Dudeperson176123 │ 18 │ │ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │ │ Wikipedia:In the news/Candidates │ 17 │ │ Wikipedia:Requests for page protection │ 17 │ └──────────────────────────────────────────────────────────┴───────┘ Retrieved 10 rows in 0.06s.
还可以通过Http发送SQL
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql
可以得到如下结果
[ { "page": "Wikipedia:Vandalismusmeldung", "Edits": 33 }, { "page": "User:Cyde/List of candidates for speedy deletion/Subpage", "Edits": 28 }, { "page": "Jeremy Corbyn", "Edits": 27 }, { "page": "Wikipedia:Administrators' noticeboard/Incidents", "Edits": 21 }, { "page": "Flavia Pennetta", "Edits": 20 }, { "page": "Total Drama Presents: The Ridonculous Race", "Edits": 18 }, { "page": "User talk:Dudeperson176123", "Edits": 18 }, { "page": "Wikipédia:Le Bistro/12 septembre 2015", "Edits": 18 }, { "page": "Wikipedia:In the news/Candidates", "Edits": 17 }, { "page": "Wikipedia:Requests for page protection", "Edits": 17 } ]
更多SQL示例
时间查询
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY 1
分组查询
SELECT channel, page, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel, page ORDER BY SUM(added) DESC
查询原始数据
SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5
定时查询
也可以在dsql里操作
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; │ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Retrieved 1 row in 0.03s.
二、原生JSON查询
Druid支持基于Json的查询
{ "queryType" : "topN", "dataSource" : "wikipedia", "intervals" : ["2015-09-12/2015-09-13"], "granularity" : "all", "dimension" : "page", "metric" : "count", "threshold" : 10, "aggregations" : [ { "type" : "count", "name" : "count" } ] }
把json粘贴到json 查询模式窗口
Json查询是通过向router和broker发送请求
curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
Druid提供了丰富的查询方式
Aggregation查询
Timeseries查询
{ "queryType": "timeseries", "dataSource": "sample_datasource", "granularity": "day", "descending": "true", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" }, { "type": "or", "fields": [ { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" }, { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" } ] } ] }, "aggregations": [ { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" }, { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" } ], "postAggregations": [ { "type": "arithmetic", "name": "sample_divide", "fn": "/", "fields": [ { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" }, { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" } ] } ], "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ] }
TopN查询
{ "queryType": "topN", "dataSource": "sample_data", "dimension": "sample_dim", "threshold": 5, "metric": "count", "granularity": "all", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "dim1", "value": "some_value" }, { "type": "selector", "dimension": "dim2", "value": "some_other_val" } ] }, "aggregations": [ { "type": "longSum", "name": "count", "fieldName": "count" }, { "type": "doubleSum", "name": "some_metric", "fieldName": "some_metric" } ], "postAggregations": [ { "type": "arithmetic", "name": "average", "fn": "/", "fields": [ { "type": "fieldAccess", "name": "some_metric", "fieldName": "some_metric" }, { "type": "fieldAccess", "name": "count", "fieldName": "count" } ] } ], "intervals": [ "2013-08-31T00:00:00.000/2013-09-03T00:00:00.000" ] }
GroupBy查询
{ "queryType": "groupBy", "dataSource": "sample_datasource", "granularity": "day", "dimensions": ["country", "device"], "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] }, "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "carrier", "value": "AT&T" }, { "type": "or", "fields": [ { "type": "selector", "dimension": "make", "value": "Apple" }, { "type": "selector", "dimension": "make", "value": "Samsung" } ] } ] }, "aggregations": [ { "type": "longSum", "name": "total_usage", "fieldName": "user_count" }, { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" } ], "postAggregations": [ { "type": "arithmetic", "name": "avg_usage", "fn": "/", "fields": [ { "type": "fieldAccess", "fieldName": "data_transfer" }, { "type": "fieldAccess", "fieldName": "total_usage" } ] } ], "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ], "having": { "type": "greaterThan", "aggregation": "total_usage", "value": 100 } }
Metadata查询
TimeBoundary 查询
{ "queryType" : "timeBoundary", "dataSource": "sample_datasource", "bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set "filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional }
SegmentMetadata查询
{ "queryType":"segmentMetadata", "dataSource":"sample_datasource", "intervals":["2013-01-01/2014-01-01"] }
DatasourceMetadata查询
{ "queryType" : "dataSourceMetadata", "dataSource": "sample_datasource" }
Search查询
{ "queryType": "search", "dataSource": "sample_datasource", "granularity": "day", "searchDimensions": [ "dim1", "dim2" ], "query": { "type": "insensitive_contains", "value": "Ke" }, "sort" : { "type": "lexicographic" }, "intervals": [ "2013-01-01T00:00:00.000/2013-01-03T00:00:00.000" ] }
查询建议
用Timeseries和TopN替代GroupBy
取消查询
DELETE /druid/v2/{queryId}
curl -X DELETE "http://host:port/druid/v2/abc123"
查询失败
{ "error" : "Query timeout", "errorMessage" : "Timeout waiting for task.", "errorClass" : "java.util.concurrent.TimeoutException", "host" : "druid1.example.com:8083" }
三、CURL
基于Http的查询
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty
四、客户端查询
客户端查询是基于json的
具体查看 https://druid.apache.org/libraries.html
比如python查询的pydruid
from pydruid.client import * from pylab import plt query = PyDruid(druid_url_goes_here, 'druid/v2') ts = query.timeseries( datasource='twitterstream', granularity='day', intervals='2014-02-02/p4w', aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')}, post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))}, filter=Dimension('first_hashtag') == 'sochi2014' ) df = query.export_pandas() df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0]) df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20, title='Sochi 2014') plt.ylabel('avg tweet length (chars)') plt.show()