Intelligent inspection is a preventive maintenance measure used to check the status of the system and promptly identify potential problems. The intelligent inspection system generates reports based on inspection items, enabling you to detect and address issues early. This helps reduce system failures and downtime, ensuring system stability and reliability.
The following is an overview of the intelligent inspection functionality.
Details | Note |
|---|---|
Inspection items | Intelligent inspection is currently conducted with respect to compute group system load, query load, and dedicated servers. The specific inspection items are as follows: |
Inspection types |
|
Inspection results | After an inspection task is completed, the system generates an inspection report that helps users analyze system issues based on inspection items, severity levels, and the extent of abnormal impact, and provides recommended actions. |
Resource usage | Intelligent inspection will consume current environment resources and affect performance. It is recommended to perform this operation during off-peak hours to avoid impacting business operations. |
Log in to the ByteHouse console and switch to the target environment. On the Diagnostics & Optimization tab, open the Intelligent Inspection page, select the Inspection Tasks tab, and click Create Inspection Task.
On the Create Inspection Task page, enter basic task information, select the inspection type, and view the inspection content.
warning
Intelligent inspection will consume resources in the current environment. It is recommended to perform this operation during off-peak hours to avoid impacting business operations.
The inspection task configuration
Parameter item | Parameter descriptions |
|---|---|
Inspection task name | Supports custom inspection task names. |
Inspection types | Supports manual inspection and periodic inspection.
|
Inspection contents description
Categories | Inspection items | Inspection item descriptions and risk level determination rules |
|---|---|---|
Group system load | The VW expiration time | Days until subscription-based compute group expiration. The rules for determining risk levels are as follows (unit: days):
|
Peak CPU usage in the past day | Peak CPU is defined based on a 30-minute average window. Risk level determination rules are as follows:
| |
Peak memory usage in the past day | Peak memory is defined using a 30-minute averaging window. Risk level determination rules are as follows:
| |
Peak iNodes usage in the past day | The peak iNode is defined using a 1-minute average window. The rules for determining risk levels are as follows:
| |
Peak cache usage over the past day | Peak cache is defined using a 1-minute average window. The rules for determining risk levels are as follows:
| |
Query loads for compute groups | The insert success rate for the past 1 day | Calculation formula: successful insert queries / all insert queries. Rules for determining risk levels:
|
Select success rate (%) (past day) | Calculation formula: successful queries / all queries. The rules for determining risk levels are as follows:
| |
Dedicated server | Dedicated server expiration time | Number of days remaining before the expiration date of a subscription-based dedicated server. The rules for determining risk levels are as follows (measured in days):
|
Peak CPU usage over the past day | Peak CPU is defined using a 10-minute average window, and the rules for determining risk levels are as follows:
| |
Peak memory usage in the past 24 hours | Peak memory is defined using a 10-minute average window, and the rules for determining risk levels are as follows:
| |
Data tables | Number of unhealthy partition tables | Count how many tables in the current environment contain unhealthy partitions The rules for determining risk levels are as follows:
ByteHouse provides a dedicated partition health diagnostic feature. For more information about table partition health, see Partition Health Diagnosis. |
Gateway connection count | Number of gateway connections in the past 24 hours | Calculation formula: current TCP connections / TCP connection limit. The rules for determining risk levels are as follows:
|
API Key | Remaining validity period | Number of days until all users' API keys expire. The rules for determining risk levels are as follows, measured in days:
|
Click OK. The system will create inspection tasks.
You can view tasks that have been created in the inspection task list. You can also enter a task name in the inspection task list to view inspection tasks.
For tasks where the inspection type is set to manual, you can manually trigger task execution as needed.
In the inspection task list, click the Execute now button in the operation column. In the popup, click Confirm. The system will perform the inspection.
After the system displays the message "Inspection task executed successfully", you can view the inspection task execution results on the inspection report page.
You can view the generated inspection report on the inspection report page to see the inspection results, impacts of any abnormalities, and recommended actions.
On the inspection report page, you can view the list of generated inspection reports. Click Inspection Report ID or the View Report button in the operation column to view report details.
The inspection report displays inspection details, including inspection items, inspection values, severity level, inspection category, abnormal impact, handling recommendations, and other information. You can determine whether it is necessary to handle the exception based on inspection values, severity level, impact, and other relevant information.
On the inspection report page, click the Download Report button in the operations column to download the current report.
You can also click Inspection Report ID to open the details page, and then click Download Report.
To delete an inspection report, go to the inspection report page and click the Delete button in the Actions column to delete the current report.