Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pagetypeinfo behavior #39985

Merged
merged 5 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ https:/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Fix issue where beats may report incorrect metrics for its own process when running inside a container {pull}39627[39627]
- Fix for MySQL/Performance - Query failure for MySQL versions below v8.0.1, for performance metric `quantile_95`. {pull}38710[38710]
- Normalize AWS RDS CPU Utilization values before making the metadata API call. {pull}39664[39664]
- Fix behavior of pagetypeinfo metrics {pull}39985[39985]
- Fix query logic for temp and non-temp tablespaces in Oracle module. {issue}38051[38051] {pull}39787[39787]

*Osquerybeat*
Expand Down
3 changes: 2 additions & 1 deletion metricbeat/module/linux/pageinfo/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
The pageinfo metricset reports on paging statistics as found in /proc/pagetypeinfo
The pageinfo metricset reports on paging statistics as found in `/proc/pagetypeinfo`


Reported metrics are broken down by page type: DMA, DMA32, Normal, and Highmem. These types are further broken down by order, which represents zones of 2^ORDER*PAGE_SIZE.
Expand All @@ -7,3 +7,4 @@ These metrics are divided into two reporting types: `buddyinfo`, which is summar

This information can be used to determine memory fragmentation. The kernel https://www.kernel.org/doc/gorman/html/understand/understand009.html[buddy algorithim] will always search for the smallest page order to allocate, and if none is available, a larger page order will be split into two "buddies." When memory is freed, the kernel will attempt to merge the "buddies." If the only available pages are at lower orders, this indicates fragmentation, as buddy pages cannot be merged.

Note that page counts from `/proc/pagetypeinfo` will only display values up to 100,000.
5 changes: 4 additions & 1 deletion metricbeat/module/linux/pageinfo/pageinfo.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (

"github.com/elastic/beats/v7/libbeat/common/cfgwarn"
"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/elastic-agent-libs/logp"
"github.com/elastic/elastic-agent-libs/mapstr"
"github.com/elastic/elastic-agent-system-metrics/metric/system/resolve"
)
Expand All @@ -43,6 +44,7 @@ func init() {
type MetricSet struct {
mb.BaseMetricSet
mod resolve.Resolver
log *logp.Logger
}

// New creates a new instance of the MetricSet. New is responsible for unpacking
Expand All @@ -55,6 +57,7 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
return &MetricSet{
BaseMetricSet: base,
mod: sys,
log: logp.NewLogger("pageinfo"),
}, nil
}

Expand All @@ -72,7 +75,7 @@ func (m *MetricSet) Fetch(report mb.ReporterV2) error {

reader := bufio.NewReader(fd)

zones, err := readPageFile(reader)
zones, err := readPageFile(m.log, reader)
if err != nil {
return fmt.Errorf("error reading pagetypeinfo: %w", err)
}
Expand Down
15 changes: 15 additions & 0 deletions metricbeat/module/linux/pageinfo/pageinfo_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,30 @@
package pageinfo

import (
"bufio"
"os"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

mbtest "github.com/elastic/beats/v7/metricbeat/mb/testing"
_ "github.com/elastic/beats/v7/metricbeat/module/linux"
"github.com/elastic/elastic-agent-libs/logp"
"github.com/elastic/elastic-agent-libs/mapstr"
)

func TestFileRead(t *testing.T) {
fd, err := os.Open("./testdata/pagetypeinfo")
require.NoError(t, err)

reader := bufio.NewReader(fd)

zones, err := readPageFile(logp.L(), reader)
require.NoError(t, err)
require.Equal(t, int64(100000), zones.Zones[0].Normal["Movable"][1])
}

func TestData(t *testing.T) {
f := mbtest.NewReportingMetricSetV2Error(t, getConfig())
err := mbtest.WriteEventsReporterV2Error(f, t, ".")
Expand Down
50 changes: 45 additions & 5 deletions metricbeat/module/linux/pageinfo/reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,14 @@ package pageinfo

import (
"bufio"
"errors"
"fmt"
"io"
"regexp"
"strconv"
"strings"

"github.com/elastic/elastic-agent-libs/logp"
)

// zones represents raw pagetypeinfo data
Expand All @@ -46,11 +50,28 @@ type pageInfo struct {
Zones map[int64]zones
}

var pageinfoLine = regexp.MustCompile(`Node\s*(\d), zone\s*([a-zA-z0-9]*), type\s*([a-zA-z0-9]*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)\s*(\d*)`)
var pageinfoLine = regexp.MustCompile(`Node\s*(\d), zone\s*([a-zA-z0-9]*), type\s*([a-zA-z0-9]*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)\s*(>?\d*)`)

// readPageFile reads a PageTypeInfo file and returns the parsed data
// This returns a massive representation of all the meaningful data in /proc/pagetypeinfo
func readPageFile(reader *bufio.Reader) (pageInfo, error) {
//
// the actual numbers in pagetypeinfo follow the same format as /proc/buddyinfo,
// but broken down by node and ability to move
// see https://www.kernel.org/doc/Documentation/filesystems/proc.txt:
/*
> cat /proc/buddyinfo

Node 0, zone DMA 0 4 5 4 4 3 ...
Node 0, zone Normal 1 0 0 1 101 8 ...
Node 0, zone HighMem 2 0 0 1 1 0 ...

Each column represents the number of pages of a certain order which are
available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in
ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE
available in ZONE_NORMAL, etc...
*/

func readPageFile(log *logp.Logger, reader *bufio.Reader) (pageInfo, error) {
nodes := make(map[int64]zones)

buddy := buddyInfo{
Expand All @@ -62,7 +83,7 @@ func readPageFile(reader *bufio.Reader) (pageInfo, error) {
for {
raw, err := reader.ReadString('\n')

if err == io.EOF || err == io.ErrUnexpectedEOF {
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
break
}

Expand Down Expand Up @@ -97,9 +118,28 @@ func readPageFile(reader *bufio.Reader) (pageInfo, error) {
migrateType = string(match[3])
//Iterate over the order counts
for order, count := range match[4:] {
zoneOrders[order], err = strconv.ParseInt(string(count), 10, 64)
// zone count will produce numbers like this:
// >100000
// we need to catch that.
// for more context on why this happens, see the comment in mm/vmstat.c:
/*
* Cap the free_list iteration because it might
* be really large and we are under a spinlock
* so a long time spent here could trigger a
* hard lockup detector. Anyway this is a
* debugging tool so knowing there is a handful
* of pages of this order should be more than
* sufficient.
*/
strCount := string(count)
if strings.Contains(strCount, ">") {
log.Debugf("got imprecise value '%s' in node %d", strCount, nodeLevel)
// make no assumptions, trim the value and pass it on
strCount = strings.Trim(strCount, ">")
}
zoneOrders[order], err = strconv.ParseInt(strCount, 10, 64)
if err != nil {
return pageInfo{}, fmt.Errorf("error parsing zone: %s: %w", string(count), err)
return pageInfo{}, fmt.Errorf("error parsing zone: %s: %w", strCount, err)
}
nodes[nodeLevel].OrderSummary[order] += zoneOrders[order]
if zoneType == "DMA" {
Expand Down
24 changes: 24 additions & 0 deletions metricbeat/module/linux/pageinfo/testdata/pagetypeinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Page block order: 9
Pages per block: 512

Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 0 0 0 1 1 1 1 1 0 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 2
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 475 434 197 179 68 32 16 12 10 0 0
Node 0, zone DMA32, type Movable 12157 6536 1695 225 79 49 379 88 28 1 0
Node 0, zone DMA32, type Reclaimable 191 92 537 141 12 1 1 1 1 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 66 160 177 31 10 10 7 1 0 0 0
Node 0, zone Normal, type Movable 10 >100000 70473 3338 115 109 14 60 55 1 0
Node 0, zone Normal, type Reclaimable 994 603 162 16 1 0 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0

Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate
Node 0, zone DMA 3 5 0 0 0
Node 0, zone DMA32 29 1405 94 0 0
Node 0, zone Normal 841 29689 702 0 0
Loading