复制表schema如下:``` CREATE TABLE pub_report_query_keyword_creative_day_local ( chan_pub_id
Int64, channel_id
Int64, publisher_id
Int64, pub_account_id
Int64, pub_campaign_id
Int64, pub_adgroup_id
Int64, data_date
Date ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/xx/pub_report_query_keyword_creative_day_local_v1', '{replica}') PARTITION BY (toYYYYMMDD(data_date)) PRIMARY KEY (chan_pub_id, pub_campaign_id, pub_adgroup_id) ORDER BY (chan_pub_id, pub_campaign_id, pub_adgroup_id) SETTINGS min_bytes_for_wide_part = '100M', index_granularity = 8192
由于我们会重做某天或者某段时间的部分数据,所以重做的思路如下:
1、创建临时本地表:
CREATE TABLE temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288 ( chan_pub_id
Int64, channel_id
Int64, publisher_id
Int64, pub_account_id
Int64, pub_campaign_id
Int64, pub_adgroup_id
Int64, device
Int8, data_date
Date ) ENGINE = MergeTree() PARTITION BY (toYYYYMMDD(data_date)) PRIMARY KEY (chan_pub_id, pub_campaign_id, pub_adgroup_id) ORDER BY (chan_pub_id, pub_campaign_id, pub_adgroup_id) SETTINGS index_granularity = 8192
2、将更新的部分数据写入临时表
3、将在原表不在临时表的数据写入临时表,这样临时表数据就是最新最完整的,示例
sql:insert into temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288 select * from pub_report_query_keyword_creative_day_local where data_date=xxx and chan_pub_id not in (select chan_pub_id temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288 group by chan_pub_id order by null) 4、将临时表的分区replace到原表,实例sql:alter table pub_report_query_keyword_creative_day_local replace partition tuple(20201013) from temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288
5、客户端没有任何错误,提示sql执行成功,但是ck的error log会有如下错误,并发(一个表不同分区并发)约频繁,错误越多:
2020.11.11 09:06:39.641967 [ 24917 ] {}
pub_report_query_keyword_creative_day_local: DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Code: 235, e.displayText() = DB::Exception: Part 20201110-1-3_80_80_1 (state Committed) already exists, Stack trace (when copying this message, always include the lines below):
- /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Exception.cpp:27: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits
, std::__1::allocator
> const&, int) @ 0x18e2f1a0 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../src/Common/Exception.cpp:37: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits
, std::__1::allocator
> const&, int) @ 0xe73a32d in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:1927: DB::MergeTreeData::renameTempPartAndReplace(std::__1::shared_ptrDB::IMergeTreeDataPart&, SimpleIncrement*, DB::MergeTreeData::Transaction*, std::__1::unique_lockstd::__1::mutex&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > >*) (.cold) @ 0x1626022a in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/__mutex_base:140: DB::MergeTreeData::renameTempPartAndReplace(std::__1::shared_ptrDB::IMergeTreeDataPart&, SimpleIncrement*, DB::MergeTreeData::Transaction*) @ 0x1624c74c in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/vector:555: DB::StorageReplicatedMergeTree::executeReplaceRange(DB::ReplicatedMergeTreeLogEntry const&) @ 0x16080e04 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1259: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x160abd95 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:2575: DB::StorageReplicatedMergeTree::queueTask()::'lambda0'(std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&)::operator()(std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&) const (.isra.0) @ 0x160ac15d in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:1280: DB::ReplicatedMergeTreeQueue::processEntry(std::__1::function<std::__1::shared_ptrzkutil::ZooKeeper ()>, std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&, std::__1::function<bool (std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&)>) @ 0x16403572 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1825: DB::StorageReplicatedMergeTree::queueTask() @ 0x1605ca9e in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1867: DB::BackgroundProcessingPool::workLoopFunc() @ 0x161e0723 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.h:170: ThreadFromGlobalPool::ThreadFromGlobalPool<DB::BackgroundProcessingPool::BackgroundProcessingPool(int, DB::BackgroundProcessingPool::PoolSettings const&, char const*, char const*)::'lambda'()>(DB::BackgroundProcessingPool::BackgroundProcessingPool(int, DB::BackgroundProcessingPool::PoolSettings const&, char const*, char const*)::'lambda'()&&)::'lambda'()::operator()() const @ 0x161e1062 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:856: ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xe767b47 in /usr/lib/debug/usr/bin/clickhouse
- /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2615: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void ThreadPoolImplstd::__1::thread::scheduleImpl
(std::__1::function<void ()>, int, std::__1::optional
)::'lambda1'()> >(void*) @ 0xe766093 in /usr/lib/debug/usr/bin/clickhouse
- start_thread @ 0x740b in /usr/lib64/libpthread-2.26.so
- __GI___clone @ 0xeceef in /usr/lib64/libc-2.26.so (version 20.9.4.76 (official build))
6、尝试过这个方案:https://github.com/ClickHouse/ClickHouse/issues/14582,不行