A standardized, portable, and flexible in-VM service for virtual machines and virtual machine scale sets, VM Watch is currently in preview. At programmable intervals, it performs health checks inside the virtual machine and transmits the findings to Azure using a standard data model. Azure’s production monitoring AIOps (AI Operations) engines use these health results to identify and stop regressions. The Application Health VM addon is used to deliver VM Watch, giving users convenience in deployment and management. Additionally, VM Watch is provided to clients at no extra expense.
Details of the VM watch monitoring
- Adoption simplicity: The Application Health VM addon makes VM Watch accessible.
Flexible Deployment: Using the ARM template, PowerShell, or AZ CLI, users may easily enable VM Watch. - Compatibility: VM Watch runs without a hitch on Windows and Linux systems. Additionally, VM Watch can be used with both individual and VMSS virtual machines.
- Resource Governance: Without affecting system performance, VM Watch offers effective monitoring. To safeguard the virtual machine, resource caps are applied to the CPU and memory usage of the VM watch process itself.
- Ready Out-of-the-Box: VM Watch has a set of default tests that can be readily modified to allow for scenario-specific testing. Below is comprehensive information on the Tests (Checks, Metrics, and Event Logs).
Network:
Signal Name | Type | Description |
---|---|---|
Outbound connectivity | Check | Verify the network outbound connectivity from the Azure VM. |
DNS Resolution | Check | Verify if the DNS name(s) can be resolved. |
SegmentsRetransmitted | Metric | The number of TCP segments transmitted containing one or more previously transmitted octets. |
NormalizedSegmentsRetransmitted | Metric | SegmentsRetransmitted / (SegmentsSent + SegmentsReceived) |
ConnectionResets | Metric | Number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE_WAIT state. |
NormalizedConnectionResets | Metric | ConnectionResets / CurrentConnections |
FailedConnectionAttempts | Metric | Number of times TCP connections have made a direct transition to the CLOSED state from either the SYN_SENT state or the SYN_RCVD state. |
NormalizedFailedConnectionAttempts | Metric | FailedConnectionAttempts / (ActiveConnectionOpenings + PassiveConnectionOpenings) |
ActiveConnectionOpenings | Metric | Number of times TCP connections have made a direct transition to the SYN_SENT state from the CLOSED state. |
PassiveConnectionOpenings | Metric | Number of times TCP connections have made a direct transition to the SYN_RCVD state from the LISTEN state. |
CurrentConnections | Metric | Number of connections established. |
SegmentsReceived | Metric | Number of segments received, including those received in error. |
SegmentsSent | Metric | Number of segments sent, including those on current connections but excluding those containing only retransmitted octets. |
Disk: | ||
---|---|---|
Signal Name | Type | Description |
Azure Disk I/O | Check | Verify file creation, write, read, delete operations on each drive mounted to the VM |
FreeSpaceInBytes | Metric | The free disk space of the target mount point |
UsedSpaceInBytes | Metric | The used disk space of the target mount point |
CapacityInBytes | Metric | The disk space capacity of the target mount point |
UsedPercent | Metric | The used disk space percentage of the target mount point |
WriteOps | Metric | The write operations per second of the target disk/partition |
ReadOps | Metric | The read operations per second of the target disk/partition |
CPU: | ||
Signal Name | Type | Description |
ProcessCoreUsage | Metric | An instantaneous measurement of the percentage of a single CPU core that the target process is using (100 = 100%, a whole core) |
ProcessMachineUsage | Metric | The percentage of the machine’s total CPU that this process is using |
MachineTotalCpuUsage | Metric | The VM’s total instantaneous CPU utilization |
Process: | ||
Signal Name | Type | Description |
Process Creation | Check | Starts a lightweight process to validate that process creation is possible |
Running Process(es) | Check | Verify if the target process(es) are running |
UpTime | Metric | How long the target process has been up and running since last process startup |
IMDS: | ||
---|---|---|
Signal Name | Type | Description |
IMDS | Check | Verify user can reach IMDS endpoint from within the VM and VM information is returned from the IMDS endpoint query |
Clock: | ||
Signal Name | Type | Description |
Clock Skew | Check | Verify the clock skew between remote NTP server and the Azure VM. For Windows VM, fallback to check if Windows Time Service is synced with w32tm if remote NTP server is inaccessible |
AzBlob: | ||
Signal Name | Type | Description |
Azure Storage blob connectivity | Check | Verify the connectivity to the Azure Storage Blob and download the Blob with MSI or SAS token |
Hardware: | ||
Signal Name | Type | Description |
Hardware Health Monitor | EventLog | Collect hardware health info from Windows event log, currently only disk-related critical events are collected, including events with id 7, 500, 504, 505, 512, and 549 |
Learn more form this Microsoft Documentation: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/health-extension?tabs=rest-api