Let’s view the preceding code step by step:

  1. We parse the log string into an array of lines where every line contains an array of strings. This means that we need to find a separator that splits the lines and a separator that splits a line into segments.
  2. We map the array of segments from each line to an object. This helps us to identify the different parts of the log message (such as date, error message, ip address, and so on). We also convert the time string to a timestamp of a JavaScript Date object.
  3. We discard all rows that don’t have a valid time attribute.
  4. We group the data logs by an interval of minutes. From the preceding points, point 1 is the most difficult point; therefore, I will explain it systematically with two example logs.

First, we will use a MySQL slow query log from the var/log/mysql directory with the following structure:

# Time: 141129 17:24:37
# [email protected]: root[root] @ server.com [172.14.26.38]
# Query_time: 2.240000  Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 2560674
SET timestamp=1334841877;
SELECT  ...;
# Time: 141129 17:24:39
# [email protected]: root[root] @ server.com [172.14.26.38]
# Query_time: 1.896000  Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 2560674
SET timestamp=1334841879;
SELECT  ...;

First, we can split the log string using the /# Time:/regular expression to generate an array of log entries:

Array[
 '141129 17:24:37
  # [email protected]: root[root] @ server.com [172.14.26.38]
  # Query_time: 2.240000  Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 2560674
  SET timestamp=1334841877;
  SELECT  ...;',
 '141129 17:24:39
  # [email protected]: root[root] @ server.com [172.14.26.38]
  # Query_time: 1.896000  Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 2560674
  SET timestamp=1334841879;
  SELECT  ...;'
];

Then, using the newline symbol, every single log entry can be split via the /\n/ regular expression into single segments:

Array[
  Array['141129 17:24:37',
  '# [email protected]: root[root] @ server.com [172.14.26.38]',
  '# Query_time: 2.240000  Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 2560674',
  'SET timestamp=1334841877;',
  'SELECT  ...;'],
  Array['141129 17:24:39',
  '# [email protected]: root[root] @ server.com [172.14.26.38]',
  '# Query_time: 1.896000  Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 2560674',
  'SET timestamp=1334841879;',
  'SELECT  ...;']
];

To make the dataset more readable, we can also remove some characters (such as # from the log entries). As a last step, we need to convert the DateTime string to a JavaScript Date Object. We can do this here by using the %y%m%d %H:%M:%S D3.js formatter. Now, we have a beautiful dataset with valid JavaScript dates. We can easily display it in a chart, for example, as a histogram.

Let’s try it once more and parse a NginX error log with the following structure:

2014/11/29 11:13:53 [alert] 6976#8040: could not respawn worker
2014/11/29 11:14:24 [emerg] 6488#2952: unknown directive "concat" in /etc/nginx/conf/nginx.conf:76

Splitting the lines is very easy because every log entry starts on a new line; thus, we can use the /\n/ regular expression to split them:

Array[
  '2014/11/29 11:13:53 [alert] 6976#8040: could not respawn worker',
  '2014/11/29 11:14:24 [emerg] 6488#2952: unknown directive "concat" in /etc/nginx/conf/nginx.conf:76'
];

In the next step, we will divide every line into segments by splitting it with the [ and ] characters with the /\[|\]/ regular expression:

Array[
  Array['2014/11/29 11:13:53', 'alert', '6976#8040: could not respawn worker'],
  Array['2014/11/29 11:14:24', 'emerg', '6488#2952: unknown directive "concat" in /etc/nginx/conf/nginx.conf:76']
];

Again, as a last step, we need to convert the date string into a JavaScript Date object; this can be done with the %Y/%m/%d %H:%M:%S formatter in this example.