I\'m dealing with a time series that consists of one year\'s data of the result

ID: 649707 • Letter: I

Question

I'm dealing with a time series that consists of one year's data of the result of some measurement. The time index is in second and the result of the measurement is a non-negative integer.

Interestly, after some tweaks & transformation & plot of the original data, I found that in some days the data shows a periodic pattern within that each day.

For an instance, I found that on one day throughout the whole day a specific measurement (integer) is recorded every 1 minute. The pattern is obvious because the integers are almost the same and spaced in nearly 1 minute interval while others are kind of random.

Whilst it is relatively easy to spot the pattern I found it hard to precisely state it. The main issues here are that

The "specific" integers are slightly varying, say the majority of the integers are 1700, but occasionally I can find 1800, 1600, or 1701, etc;
The "specific" integers are different across days and sometimes even different within one day.
The interval is not necessarily 1 minute and furthermore, it is not the same even within a single day. Say, in the morning the pattern is spaced in 58 second interval whilst in the afternoon, it is in 61 second interval;
The interval is not very precise either. Say, in the first 2 hours patterns are spaced in 60 second interval but then a 61-second interval occurrs and then after that going back to the 60-second interval again;

Because the data is huge and so I want to see if I can programmatically identify the patterns (if it exists for a day) and also retrieve those "specific" integers in the pattern.

Anyone suggest an approach? Or any guidance?

Thanks.

Explanation / Answer

One way to detect a pattern is to write a filter that scans the entire sequence and returns true/false at each point. True means that the pattern was recognized, false means that it was not found. Count the number of "true" values and divide by the entire length to see the fraction of times that the pattern was found.

For example, one filter would require three parameters: f, the expected frequency (60 seconds); w, the allowed width (2 seconds means 58-62 seconds); and d, the allowable difference between values.

In Python code:

f = 60 # frequency, every 60 points
w = 2 # width, (point - f) +/- w
d = 10 # delta, allowable difference

points = [... list of data points ...]
n = len(points)

found = 0 # start counting the number of matches
# main loop: start at point f+w so we can look back that far
for p1 in range(f+w, n+1):
# compare point p1 with point p0, looking back (f +/- w)
for p0 in range(p1-f-w, p1-f+w+1):
if abs(points[p1] - points[p0]) < d:
found += 1 # filter found a match
break # exit innermost loop
print 100.0 * found / (n - f - w) # show the percentage of matches

You can play around with the parameters, and maybe use d as a percentage instead of an absolute value.

For more complex pattern recognition, you will need to write more complex filters. This is a start.

Navigate

I\'m currently writing an examen about \'Lessons Learned in the IT-Security\'. I

I\'m dealing with legacy code. It contains some BIG classes (line count 8000+) a

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

I\'m dealing with a time series that consists of one year\'s data of the result

Question

Explanation / Answer

Related Questions

Navigate