概述--湖仓一体分析服务 LAS 私有化-火山引擎

文档中心

湖仓一体分析服务 LAS 私有化

HBase

概述

HBase基本概念

HBase 是一个开源的非关系型分布式数据库，它参考了 Google 的 BigTable 模型，实现语言为 Java。它是 Apache 软件基金会的 Hadoop 项目的一部分，运行在 HDFS 文件系统之上，为 Hadoop 提供类 BigTable 的服务。
HBase关键名词说明：

名词	说明
Namespace	命名空间是表的逻辑分组，类似于关系数据库系统中的数据库。这种抽象为即将到来的多租户相关功能奠定了基础。
Table	表是在架构定义时预先声明的。
Row	行键是未解释的字节。行按字典 Sequences 排序，最低 Sequences 在表中排在最前面。空字节数组用于表示表名称空间的开始和结束。
列族	在 Apache HBase 中，列族是一种重要的逻辑结构。列族将相关列组合在一起，列成员具有相同前缀且用冒号分隔列族与列族限定符。列族需在架构定义时预先声明，而具体列可在表运行时动态添加。同一列族的所有列成员存储在文件系统同一处，存储规范在列族级别完成，可针对不同列族设置不同存储参数。同时，建议同一列族成员具有相同常规访问模式和大小特征，以优化存储和查询性能。
Cells	在 HBase 中，Cell 是由行键、列族、列限定符和时间戳确定的一个存储单元，用于存储特定版本的数据值。
Versions	在 HBase 中，“Versions”（版本）指的是同一个单元格（Cell）中可以存储的不同时间戳的数据版本数量。HBase 允许为每个单元格存储多个版本的数据，通过时间戳来区分不同版本。用户可以根据需求配置保留的版本数量。

数据模型操作

HBase 四个主要的数据模型操作是 Get，Put，Scan 和 Delete。通过 Table 实例应用操作。

Get

Get 返回指定行的属性。通过 Table.get 执行获取

Put

Put 可以将新行添加到表中(如果键是新键)，也可以更新现有行(如果键已存在)。通过 Table.put (非 writeBuffer) 或 Table.batch (non-writeBuffer) 执行。

Scan

Scan 允许针对指定属性在多行上进行迭代。
以下是“扫描表”实例的示例。假设一个表中填充了键为“ row1”，“ row2”，“ row3”的行，然后填充了另一组键为“ abc1”，“ abc2”和“ abc3”的行。以下示例显示如何设置 Scan 实例以返回以“ row”开头的行。

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...

Table table = ...      // instantiate a Table instance

Scan scan = new Scan();
scan.addColumn(CF, ATTR);
scan.setRowPrefixFilter(Bytes.toBytes("row"));
ResultScanner rs = table.getScanner(scan);
try {
  for (Result r = rs.next(); r != null; r = rs.next()) {
    // process result...
  }
} finally {
  rs.close();  // always close the ResultScanner!
}

Delete

Delete 从表中删除一行。删除是通过 Table.delete 执行的。
HBase 不会就地修改数据，因此删除操作通过创建称为墓碑的新标记来处理。这些删除标记会在compaction 时候清理。

HBase Snapshot 迁移

生成快照

使用 HBase Shell 命令进入交互式查询，执行下面命令，对表名为 t1 的表生成快照

snapshot 't1','t1_snapshot'

查看快照生成结果

list_snapshots

SNAPSHOT                              TABLE + CREATION TIME                                                                                      
 t1_snapshot                          t1 (Thu Nov 03 21:20:51 +0800 2022)                                                                        
1 row(s) in 0.0080 seconds

=> ["t1_snapshot"]

查看快照文件

使用 quit 退出 HBase Shell，执行 hdfs 命令查看快照，可以看到名为 .hbase-snapshot 的快照文件

hadoop fs -ls /apps/hbase/data


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/tez-0.10.1/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/tez-0.10.1/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
log4j:WARN custom level class [Relative to Yarn Log Dir Prefix] not found.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
Found 9 items
drwxr-xr-x   - hbase hdfs          0 2022-11-03 21:24 /apps/hbase/data/.hbase-snapshot
drwxr-xr-x   - hbase hdfs          0 2022-11-03 10:58 /apps/hbase/data/.tmp
drwxr-xr-x   - hbase hdfs          0 2022-11-03 21:15 /apps/hbase/data/MasterProcWALs
drwxr-xr-x   - hbase hdfs          0 2022-11-03 10:58 /apps/hbase/data/WALs
drwxr-xr-x   - hbase hdfs          0 2022-11-03 10:58 /apps/hbase/data/corrupt
drwxr-xr-x   - hbase hdfs          0 2022-11-03 10:58 /apps/hbase/data/data
-rw-r--r--   2 hbase hdfs         42 2022-11-03 10:54 /apps/hbase/data/hbase.id
-rw-r--r--   2 hbase hdfs          7 2022-11-03 10:54 /apps/hbase/data/hbase.version
drwxr-xr-x   - hbase hdfs          0 2022-11-03 21:17 /apps/hbase/data/oldWALs

使用 Snapshot 工具迁移快照文件

命令迁移

若 LAS 集群已开启 Ranger 权限管理，您需要在 Ranger UI 界面上，为 HBase 用户（以下命令默认会使用 HBase 用户执行）设置相关 HDFS 路径的权限和 Yarn 的提交权限。
迁移到本地目录 /tmp/20221103 下

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot t1_snapshot -copy-from hdfs:///apps/hbase -copy-to hdfs:///tmp/20221103

查看迁移结果，有两个文件，一个是快照，一个是元数据校验

hadoop fs -ls /tmp/20221103

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/tez-0.10.1/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/tez-0.10.1/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
log4j:WARN custom level class [Relative to Yarn Log Dir Prefix] not found.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
Found 2 items
drwxr-xr-x   - root hdfs          0 2022-11-03 21:26 /tmp/20221103/.hbase-snapshot //快照文件
drwxr-xr-x   - root hdfs          0 2022-11-03 21:26 /tmp/20221103/archive //元数据校验文件

快照文件

直接将快照文件发送至另一个 HBase 集群的 hdfs 目录下
这里以另一个 HBase 集群，master 节点为 emr-4dh2cu897xxxxxxx-master-1 为例，执行以下命令：

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot t1_snapshot -copy-to hdfs://emr-4dh2cu897xxxxxxx-master-1:8020/apps/hbase/data

然后登陆 emr-4dh2cu897xxxxxxx-master-1 节点使用 HDFS 命令查看，登录方式详见登录集群。

hadoop fs -ls /apps/hbase/data
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/tez-0.10.1/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/emr/2.0.0/tez-0.10.1/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
log4j:WARN custom level class [Relative to Yarn Log Dir Prefix] not found.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
Found 10 items
drwxr-xr-x   - root  hdfs          0 2022-11-04 10:46 /apps/hbase/data/.hbase-snapshot //快照文件
drwxrwxrwx   - hbase hdfs          0 2022-11-03 16:46 /apps/hbase/data/.tmp
drwxrwxrwx   - hbase hdfs          0 2022-11-04 10:36 /apps/hbase/data/MasterProcWALs
drwxrwxrwx   - hbase hdfs          0 2022-11-02 11:19 /apps/hbase/data/WALs
drwxr-xr-x   - root  hdfs          0 2022-11-04 10:46 /apps/hbase/data/archive  //元数据校验文件
drwxrwxrwx   - hbase hdfs          0 2022-11-02 11:18 /apps/hbase/data/corrupt
drwxrwxrwx   - hbase hdfs          0 2022-11-02 11:19 /apps/hbase/data/data
-rwxrwxrwx   2 hbase hdfs         42 2022-11-01 20:12 /apps/hbase/data/hbase.id
-rwxrwxrwx   2 hbase hdfs          7 2022-11-01 20:12 /apps/hbase/data/hbase.version
drwxrwxrwx   - hbase hdfs          0 2022-11-04 10:42 /apps/hbase/data/oldWALs

从快照恢复表

从 HDFS 目录中将快照文件拷贝到数据目录：

hadoop fs -cp /tmp/20221103/* /apps/hbase/data

进入 HBase Shell：

hbase shell

先查看快照是否存在：

list_snapshots


SNAPSHOT                                       TABLE + CREATION TIME                                                                                                                  
 t1_snapshot                                   t1 (Thu Nov 03 11:06:10 +0800 2022)                                                                                                    
1 row(s) in 0.1360 seconds

=> ["t1_snapshot"]

通过快照恢复表：

restore_snapshot 't1_snapshot'

使用 list 查看：

list


TABLE                                                                                                                                                                                 
t1                                                                                                                                                                                    
1 row(s) in 0.0040 seconds

=> ["t1"]

查询数据，观察是否与之前插入的数据一致：

get 't1','rowkey001', {COLUMN=>'f1:col1'}
COLUMN                                         CELL                                                                                                                                   
 f1:col1                                       timestamp=1667444472819, value=value01                                                                                                 
1 row(s) in 0.0510 seconds

方式二的迁移方式也同样验证流程，需要在另一个集群执行同样 HBase 命令。

开发示例

使用 Java 创建，修改和删除表：
JDK可使用 1.8，HBase版本为 2.3.7。

package com.example.hbase.admin;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.compress.Compression.Algorithm;

public class Example {

  private static final String TABLE_NAME = "MY_TABLE_NAME_TOO";
  private static final String CF_DEFAULT = "DEFAULT_COLUMN_FAMILY";

  public static void createOrOverwrite(Admin admin, HTableDescriptor table) throws IOException {
    if (admin.tableExists(table.getTableName())) {
      admin.disableTable(table.getTableName());
      admin.deleteTable(table.getTableName());
    }
    admin.createTable(table);
  }

  public static void createSchemaTables(Configuration config) throws IOException {
    try (Connection connection = ConnectionFactory.createConnection(config);
         Admin admin = connection.getAdmin()) {

      HTableDescriptor table = new HTableDescriptor(TableName.valueOf(TABLE_NAME));
      table.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.NONE));

      System.out.print("Creating table. ");
      createOrOverwrite(admin, table);
      System.out.println(" Done.");
    }
  }

  public static void modifySchema (Configuration config) throws IOException {
    try (Connection connection = ConnectionFactory.createConnection(config);
         Admin admin = connection.getAdmin()) {

      TableName tableName = TableName.valueOf(TABLE_NAME);
      if (!admin.tableExists(tableName)) {
        System.out.println("Table does not exist.");
        System.exit(-1);
      }

      HTableDescriptor table = admin.getTableDescriptor(tableName);

      // Update existing table
      HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF");
      newColumn.setCompactionCompressionType(Algorithm.GZ);
      newColumn.setMaxVersions(HConstants.ALL_VERSIONS);
      admin.addColumn(tableName, newColumn);

      // Update existing column family
      HColumnDescriptor existingColumn = new HColumnDescriptor(CF_DEFAULT);
      existingColumn.setCompactionCompressionType(Algorithm.GZ);
      existingColumn.setMaxVersions(HConstants.ALL_VERSIONS);
      table.modifyFamily(existingColumn);
      admin.modifyTable(tableName, table);

      // Disable an existing table
      admin.disableTable(tableName);

      // Delete an existing column family
      admin.deleteColumn(tableName, CF_DEFAULT.getBytes("UTF-8"));

      // Delete a table (Need to be disabled first)
      admin.deleteTable(tableName);
    }
  }

  public static void main(String... args) throws IOException {
    Configuration config = HBaseConfiguration.create();

    //Add any necessary configuration files (hbase-site.xml, core-site.xml)
    config.addResource(new Path(System.getenv("HBASE_CONF_DIR"), "hbase-site.xml"));
    config.addResource(new Path(System.getenv("HADOOP_CONF_DIR"), "core-site.xml"));
    createSchemaTables(config);
    modifySchema(config);
  }
}

最近更新时间：2025.04.01 20:13:40

这个页面对您有帮助吗？

有用

无用

湖仓一体分析服务 LAS 私有化

HBase基本概念 #

数据模型操作 #

Get #

Put #

Scan #

Delete #

HBase Snapshot 迁移 #

生成快照 #

查看快照生成结果 #

查看快照文件 #

使用 Snapshot 工具迁移快照文件 #