Components¶
You’ll find here the list of all usable Processors, Engines, Services and other components that can be usable out of the box in your analytics streams
BulkAddElasticsearch¶
Indexes the content of a Record in Elasticsearch using elasticsearch’s bulk processor
Class¶
com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch
Tags¶
elasticsearch
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values , and whether a property supports the Expression Language .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
elasticsearch.client.service | The instance of the Controller Service to use for accessing Elasticsearch. | null | |||
default.index | The name of the index to insert into | null | true | ||
default.type | The type of this document (used by Elasticsearch for indexing and searching) | null | true | ||
timebased.index | do we add a date suffix | No date (no date added to default index), Today’s date (today’s date added to default index), yesterday’s date (yesterday’s date added to default index) | no | ||
es.index.field | the name of the event field containing es index name => will override index value if set | null | |||
es.type.field | the name of the event field containing es doc type => will override type value if set | null |
ConsolidateSession¶
The ConsolidateSession processor is the Logisland entry point to get and process events from the Web Analytics.As an example here is an incoming event from the Web Analytics:
“fields”: [{ “name”: “timestamp”, “type”: “long” },{ “name”: “remoteHost”, “type”: “string”},{ “name”: “record_type”, “type”: [“null”, “string”], “default”: null },{ “name”: “record_id”, “type”: [“null”, “string”], “default”: null },{ “name”: “location”, “type”: [“null”, “string”], “default”: null },{ “name”: “hitType”, “type”: [“null”, “string”], “default”: null },{ “name”: “eventCategory”, “type”: [“null”, “string”], “default”: null },{ “name”: “eventAction”, “type”: [“null”, “string”], “default”: null },{ “name”: “eventLabel”, “type”: [“null”, “string”], “default”: null },{ “name”: “localPath”, “type”: [“null”, “string”], “default”: null },{ “name”: “q”, “type”: [“null”, “string”], “default”: null },{ “name”: “n”, “type”: [“null”, “int”], “default”: null },{ “name”: “referer”, “type”: [“null”, “string”], “default”: null },{ “name”: “viewportPixelWidth”, “type”: [“null”, “int”], “default”: null },{ “name”: “viewportPixelHeight”, “type”: [“null”, “int”], “default”: null },{ “name”: “screenPixelWidth”, “type”: [“null”, “int”], “default”: null },{ “name”: “screenPixelHeight”, “type”: [“null”, “int”], “default”: null },{ “name”: “partyId”, “type”: [“null”, “string”], “default”: null },{ “name”: “sessionId”, “type”: [“null”, “string”], “default”: null },{ “name”: “pageViewId”, “type”: [“null”, “string”], “default”: null },{ “name”: “is_newSession”, “type”: [“null”, “boolean”],”default”: null },{ “name”: “userAgentString”, “type”: [“null”, “string”], “default”: null },{ “name”: “pageType”, “type”: [“null”, “string”], “default”: null },{ “name”: “UserId”, “type”: [“null”, “string”], “default”: null },{ “name”: “B2Bunit”, “type”: [“null”, “string”], “default”: null },{ “name”: “pointOfService”, “type”: [“null”, “string”], “default”: null },{ “name”: “companyID”, “type”: [“null”, “string”], “default”: null },{ “name”: “GroupCode”, “type”: [“null”, “string”], “default”: null },{ “name”: “userRoles”, “type”: [“null”, “string”], “default”: null },{ “name”: “is_PunchOut”, “type”: [“null”, “string”], “default”: null }]The ConsolidateSession processor groups the records by sessions and compute the duration between now and the last received event. If the distance from the last event is beyond a given threshold (by default 30mn), then the session is considered closed.The ConsolidateSession is building an aggregated session object for each active session.This aggregated object includes: - The actual session duration. - A boolean representing wether the session is considered active or closed. Note: it is possible to ressurect a session if for instance an event arrives after a session has been marked closed. - User related infos: userId, B2Bunit code, groupCode, userRoles, companyId - First visited page: URL - Last visited page: URL The properties to configure the processor are: - sessionid.field: Property name containing the session identifier (default: sessionId). - timestamp.field: Property name containing the timestamp of the event (default: timestamp). - session.timeout: Timeframe of inactivity (in seconds) after which a session is considered closed (default: 30mn). - visitedpage.field: Property name containing the page visited by the customer (default: location). - fields.to.return: List of fields to return in the aggregated object. (default: N/A)
Class¶
com.hurence.logisland.processor.consolidateSession.ConsolidateSession
Tags¶
analytics, web, session
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
debug | Enable debug. If enabled, the original JSON string is embedded in the record_value field of the record. | null | |||
session.timeout | session timeout in sec | 1800 | |||
sessionid.field | the name of the field containing the session id => will override default value if set | sessionId | |||
timestamp.field | the name of the field containing the timestamp => will override default value if set | h2kTimestamp | |||
visitedpage.field | the name of the field containing the visited page => will override default value if set | location | |||
userid.field | the name of the field containing the userId => will override default value if set | userId | |||
fields.to.return | the list of fields to return | null | |||
firstVisitedPage.out.field | the name of the field containing the first visited page => will override default value if set | firstVisitedPage | |||
lastVisitedPage.out.field | the name of the field containing the last visited page => will override default value if set | lastVisitedPage | |||
isSessionActive.out.field | the name of the field stating whether the session is active or not => will override default value if set | is_sessionActive | |||
sessionDuration.out.field | the name of the field containing the session duration => will override default value if set | sessionDuration | |||
eventsCounter.out.field | the name of the field containing the session duration => will override default value if set | eventsCounter | |||
firstEventDateTime.out.field | the name of the field containing the date of the first event => will override default value if set | firstEventDateTime | |||
lastEventDateTime.out.field | the name of the field containing the date of the last event => will override default value if set | lastEventDateTime | |||
sessionInactivityDuration.out.field | the name of the field containing the session inactivity duration => will override default value if set | sessionInactivityDuration |
ConvertFieldsType¶
Converts a field value into the given type. does nothing if conversion is not possible
Class¶
com.hurence.logisland.processor.ConvertFieldsType
Tags¶
type, fields, update, convert
Properties¶
This component has no required or optional properties.
DebugStream¶
This is a processor that logs incoming records
Class¶
com.hurence.logisland.processor.DebugStream
Tags¶
record, debug
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
event.serializer | the way to serialize event | Json serialization (serialize events as json blocs), String serialization (serialize events as toString() blocs) | json |
DetectOutliers¶
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
For every data point
- Detect outlier candidates using a robust estimator of variability (e.g. median absolute deviation) that uses distributional sketching (e.g. Q-trees)
- Gather a biased sample (biased by recency)
- Extremely deterministic in space and cheap in computation
For every outlier candidate
- Use traditional, more computationally complex approaches to outlier analysis (e.g. Robust PCA) on the biased sample
- Expensive computationally, but run infrequently
This becomes a data filter which can be attached to a timeseries data stream within a distributed computational framework (i.e. Storm, Spark, Flink, NiFi) to detect outliers.
Class¶
com.hurence.logisland.processor.DetectOutliers
Tags¶
analytic, outlier, record, iot, timeseries
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
value.field | the numeric field to get the value | record_value | |||
time.field | the numeric field to get the value | record_time | |||
output.record.type | the output type of the record | alert_match | |||
rotation.policy.type | ... | by_amount, by_time, never | by_amount | ||
rotation.policy.amount | ... | 100 | |||
rotation.policy.unit | ... | milliseconds, seconds, hours, days, months, years, points | points | ||
chunking.policy.type | ... | by_amount, by_time, never | by_amount | ||
chunking.policy.amount | ... | 100 | |||
chunking.policy.unit | ... | milliseconds, seconds, hours, days, months, years, points | points | ||
sketchy.outlier.algorithm | ... | SKETCHY_MOVING_MAD | SKETCHY_MOVING_MAD | ||
batch.outlier.algorithm | ... | RAD | RAD | ||
global.statistics.min | minimum value | null | |||
global.statistics.max | maximum value | null | |||
global.statistics.mean | mean value | null | |||
global.statistics.stddev | standard deviation value | null | |||
zscore.cutoffs.normal | zscoreCutoffs level for normal outlier | 0.000000000000001 | |||
zscore.cutoffs.moderate | zscoreCutoffs level for moderate outlier | 1.5 | |||
zscore.cutoffs.severe | zscoreCutoffs level for severe outlier | 10.0 | |||
zscore.cutoffs.notEnoughData | zscoreCutoffs level for notEnoughData outlier | 100 | |||
smooth | do smoothing ? | false | |||
decay | the decay | 0.1 | |||
min.amount.to.predict | minAmountToPredict | 100 | |||
min_zscore_percentile | minZscorePercentile | 50.0 | |||
reservoir_size | the size of points reservoir | 100 | |||
rpca.force.diff | No Description Provided. | null | |||
rpca.lpenalty | No Description Provided. | null | |||
rpca.min.records | No Description Provided. | null | |||
rpca.spenalty | No Description Provided. | null | |||
rpca.threshold | No Description Provided. | null |
EnrichRecordsElasticsearch¶
Enrich input records with content indexed in elasticsearch using multiget queries. Each incoming record must be possibly enriched with information stored in elasticsearch. The plugin properties are : - es.index (String) : Name of the elasticsearch index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - record.key (String) : Name of the field in the input record containing the id to lookup document in elastic search. This field is mandatory. - es.key (String) : Name of the elasticsearch key on which the multiget query will be performed. This field is mandatory. - includes (ArrayList<String>) : List of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (ArrayList<String>) : List of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory.
Each outcoming record holds at least the input record plus potentially one or more fields coming from of one elasticsearch document.
Class¶
com.hurence.logisland.processor.elasticsearch.EnrichRecordsElasticsearch
Tags¶
elasticsearch
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
elasticsearch.client.service | The instance of the Controller Service to use for accessing Elasticsearch. | null | |||
record.key | The name of field in the input record containing the document id to use in ES multiget query | null | |||
es.index | The name of the ES index to use in multiget query. | null | |||
es.type | The name of the ES type to use in multiget query. | null | |||
es.includes.field | The name of the ES fields to include in the record. | ||||
es.excludes.field | The name of the ES fields to exclude. | N/A |
EvaluateJsonPath¶
Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to Records Fields depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Field Name into which the result will be placed. The value of the property must be a valid JsonPath expression. A Return Type of ‘auto-detect’ will make a determination based off the configured destination. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to ‘scalar’ the Record will be routed to error. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value. If the expression matches nothing, Fields will be created with empty strings as the value
Class¶
com.hurence.logisland.processor.EvaluateJsonPath
Tags¶
JSON, evaluate, JsonPath
Properties¶
This component has no required or optional properties.
FetchHBaseRow¶
Fetches a row from an HBase table. The Destination property controls whether the cells are added as flow file attributes, or the row is written to the flow file content as JSON. This processor may be used to fetch a fixed row on a interval by specifying the table and row id directly in the processor, or it may be used to dynamically fetch rows by referencing the table and row id from incoming flow files.
Class¶
com.hurence.logisland.processor.hbase.FetchHBaseRow
Tags¶
hbase, scan, fetch, get, enrich
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values , and whether a property supports the Expression Language .
FilterRecords¶
Keep only records based on a given field value
Class¶
com.hurence.logisland.processor.FilterRecords
Tags¶
record, fields, remove, delete
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
field.name | the field name | record_id | |||
field.value | the field value to keep | null |
GenerateRandomRecord¶
This is a processor that make random records given an Avro schema
Class¶
com.hurence.logisland.processor.GenerateRandomRecord
Tags¶
record, avro, generator
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
avro.output.schema | the avro schema definition for the output serialization | null | |||
min.events.count | the minimum number of generated events each run | 10 | |||
max.events.count | the maximum number of generated events each run | 200 |
MatchQuery¶
Query matching based on Luwak
you can use this processor to handle custom events defined by lucene queries a new record is added to output each time a registered query is matched
A query is expressed as a lucene query against a field like for example:
message:'bad exception'
error_count:[10 TO *]
bytes_out:5000
user_name:tom*
Please read the Lucene syntax guide for supported operations
Warning
don’t forget to set numeric fields property to handle correctly numeric ranges queries
Class¶
com.hurence.logisland.processor.MatchQuery
Tags¶
analytic, percolator, record, record, query, lucene
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
numeric.fields | a comma separated string of numeric field to be matched | null | |||
output.record.type | the output type of the record | alert_match | |||
include.input.records | if set to true all the input records are copied to output | true |
ModifyId¶
modify id of records or generate it following defined rules
Class¶
com.hurence.logisland.processor.ModifyId
Tags¶
record, id, idempotent, generate, modify
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
id.generation.strategy | the strategy to generate new Id | generate a random uid (generate a randomUid using java library), generate a hash from fields (generate a hash from fields), generate a string from java pattern and fields (generate a string from java pattern and fields), generate a concatenation of type, time and a hash from fields (generate a concatenation of type, time and a hash from fields (as for generate_hash strategy)) | randomUuid | ||
fields.to.hash | the comma separated list of field names (e.g. : ‘policyid,date_raw’ | record_raw_value | |||
hash.charset | the charset to use to hash id string (e.g. ‘UTF-8’) | UTF-8 | |||
hash.algorithm | the algorithme to use to hash id string (e.g. ‘SHA-256’ | SHA-384, SHA-224, SHA-256, MD2, SHA, SHA-512, MD5 | SHA-256 | ||
java.formatter.string | the format to use to build id string (e.g. ‘%4$2s %3$2s %2$2s %1$2s’ (see java Formatter) | null | |||
language.tag | the language to use to format numbers in string | aa, ab, ae, af, ak, am, an, ar, as, av, ay, az, ba, be, bg, bh, bi, bm, bn, bo, br, bs, ca, ce, ch, co, cr, cs, cu, cv, cy, da, de, dv, dz, ee, el, en, eo, es, et, eu, fa, ff, fi, fj, fo, fr, fy, ga, gd, gl, gn, gu, gv, ha, he, hi, ho, hr, ht, hu, hy, hz, ia, id, ie, ig, ii, ik, in, io, is, it, iu, iw, ja, ji, jv, ka, kg, ki, kj, kk, kl, km, kn, ko, kr, ks, ku, kv, kw, ky, la, lb, lg, li, ln, lo, lt, lu, lv, mg, mh, mi, mk, ml, mn, mo, mr, ms, mt, my, na, nb, nd, ne, ng, nl, nn, no, nr, nv, ny, oc, oj, om, or, os, pa, pi, pl, ps, pt, qu, rm, rn, ro, ru, rw, sa, sc, sd, se, sg, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tr, ts, tt, tw, ty, ug, uk, ur, uz, ve, vi, vo, wa, wo, xh, yi, yo, za, zh, zu | en |
MultiGetElasticsearch¶
Retrieves a content indexed in elasticsearch using elasticsearch multiget queries. Each incoming record contains information regarding the elasticsearch multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) : - index (String) : name of the elasticsearch index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - type (String) : name of the elasticsearch type on which the multiget query will be performed. This field is not mandatory. - ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory.
Each outcoming record holds data of one elasticsearch retrieved document. This data is stored in these fields : - index (same field name as the incoming record) : name of the elasticsearch index. - type (same field name as the incoming record) : name of the elasticsearch type. - id (same field name as the incoming record) : retrieved document id. - a list of String fields containing :
- field name : the retrieved field name
- field value : the retrieved field value
Class¶
com.hurence.logisland.processor.elasticsearch.MultiGetElasticsearch
Tags¶
elasticsearch
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
elasticsearch.client.service | The instance of the Controller Service to use for accessing Elasticsearch. | null | |||
es.index.field | the name of the incoming records field containing es index name to use in multiget query. | null | |||
es.type.field | the name of the incoming records field containing es type name to use in multiget query | null | |||
es.ids.field | the name of the incoming records field containing es document Ids to use in multiget query | null | |||
es.includes.field | the name of the incoming records field containing es includes to use in multiget query | null | |||
es.excludes.field | the name of the incoming records field containing es excludes to use in multiget query | null |
NormalizeFields¶
Changes the name of a field according to a provided name mapping ...
Class¶
com.hurence.logisland.processor.NormalizeFields
Tags¶
record, fields, normalizer
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
conflict.resolution.policy | waht to do when a field with the same name already exists ? | nothing to do (leave record as it was), overwrite existing field (if field already exist), keep only old field and delete the other (keep only old field and delete the other), keep old field and new one (creates an alias for the new field) | do_nothing |
ParseBroEvent¶
The ParseBroEvent processor is the Logisland entry point to get and process Bro events. The Bro-Kafka plugin should be used and configured in order to have Bro events sent to Kafka. See the Bro/Logisland tutorial for an example of usage for this processor. The ParseBroEvent processor does some minor pre-processing on incoming Bro events from the Bro-Kafka plugin to adapt them to Logisland.
Basically the events coming from the Bro-Kafka plugin are JSON documents with a first level field indicating the type of the event. The ParseBroEvent processor takes the incoming JSON document, sets the event type in a record_type field and sets the original sub-fields of the JSON event as first level fields in the record. Also any dot in a field name is transformed into an underscore. Thus, for instance, the field id.orig_h becomes id_orig_h. The next processors in the stream can then process the Bro events generated by this ParseBroEvent processor.
As an example here is an incoming event from Bro:
{
“conn”: {
“id.resp_p”: 9092,
“resp_pkts”: 0,
“resp_ip_bytes”: 0,
“local_orig”: true,
“orig_ip_bytes”: 0,
“orig_pkts”: 0,
“missed_bytes”: 0,
“history”: “Cc”,
“tunnel_parents”: [],
“id.orig_p”: 56762,
“local_resp”: true,
“uid”: “Ct3Ms01I3Yc6pmMZx7”,
“conn_state”: “OTH”,
“id.orig_h”: “172.17.0.2”,
“proto”: “tcp”,
“id.resp_h”: “172.17.0.3”,
“ts”: 1487596886.953917
}
}
It gets processed and transformed into the following Logisland record by the ParseBroEvent processor:
“@timestamp”: “2017-02-20T13:36:32Z”
“record_id”: “6361f80a-c5c9-4a16-9045-4bb51736333d”
“record_time”: 1487597792782
“record_type”: “conn”
“id_resp_p”: 9092
“resp_pkts”: 0
“resp_ip_bytes”: 0
“local_orig”: true
“orig_ip_bytes”: 0
“orig_pkts”: 0
“missed_bytes”: 0
“history”: “Cc”
“tunnel_parents”: []
“id_orig_p”: 56762
“local_resp”: true
“uid”: “Ct3Ms01I3Yc6pmMZx7”
“conn_state”: “OTH”
“id_orig_h”: “172.17.0.2”
“proto”: “tcp”
“id_resp_h”: “172.17.0.3”
“ts”: 1487596886.953917
Class¶
com.hurence.logisland.processor.bro.ParseBroEvent
Tags¶
bro, security, IDS, NIDS
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
debug | Enable debug. If enabled, the original JSON string is embedded in the record_value field of the record. | null |
ParseNetflowEvent¶
The Netflow V5 processor is the Logisland entry point to process Netflow (V5) events. NetFlow is a feature introduced on Cisco routers that provides the ability to collect IP network traffic.We can distinguish 2 components:
-Flow exporter: aggregates packets into flows and exports flow records (binary format) towards one or more flow collectors
-Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter
The collected data are then available for analysis purpose (intrusion detection, traffic analysis...) Netflow are sent to kafka in order to be processed by logisland. In the tutorial we will simulate Netflow traffic using nfgen. this traffic will be sent to port 2055. The we rely on nifi to listen of that port for incoming netflow (V5) traffic and send them to a kafka topic. The Netflow processor could thus treat these events and generate corresponding logisland records. The following processors in the stream can then process the Netflow records generated by this processor.
Class¶
com.hurence.logisland.processor.netflow.ParseNetflowEvent
Tags¶
netflow, security
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
debug | Enable debug. If enabled, the original JSON string is embedded in the record_value field of the record. | null | |||
output.record.type | the output type of the record | netflowevent | |||
enrich.record | Enrich data. If enabledthe netflow record is enriched with inferred data | false |
ParseNetworkPacket¶
The ParseNetworkPacket processor is the LogIsland entry point to parse network packets captured either off-the-wire (stream mode) or in pcap format (batch mode). In batch mode, the processor decodes the bytes of the incoming pcap record, where a Global header followed by a sequence of [packet header, packet data] pairs are stored. Then, each incoming pcap event is parsed into n packet records. The fields of packet headers are then extracted and made available in dedicated record fields. See the Capturing Network packets tutorial for an example of usage of this processor.
Class¶
com.hurence.logisland.processor.networkpacket.ParseNetworkPacket
Tags¶
PCap, security, IDS, NIDS
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
debug | Enable debug. | false | |||
flow.mode | Flow Mode. Indicate whether packets are provided in batch mode (via pcap files) or in stream mode (without headers). Allowed values are batch and stream. | batch, stream | null |
ParseProperties¶
Parse a field made of key=value fields separated by spaces a string like “a=1 b=2 c=3” will add a,b & c fields, respectively with values 1,2 & 3 to the current Record
Class¶
com.hurence.logisland.processor.ParseProperties
Tags¶
record, properties, parser
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
properties.field | the field containing the properties to split and treat | null |
ParseUserAgent¶
The user-agent processor allows to decompose User-Agent value from an HTTP header into several attributes of interest. There is no standard format for User-Agent strings, hence it is not easily possible to use regexp to handle them. This processor rely on the YAUAA library to do the heavy work.
Class¶
com.hurence.logisland.processor.useragent.ParseUserAgent
Tags¶
User-Agent, clickstream, DMP
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
debug | Enable debug. | false | |||
cache.enabled | Enable caching. Caching to avoid to redo the same computation for many identical User-Agent strings. | true | |||
cache.size | Set the size of the cache. | 1000 | |||
useragent.field | Must contain the name of the field that contains the User-Agent value in the incoming record. | null | |||
useragent.keep | Defines if the field that contained the User-Agent must be kept or not in the resulting records. | true | |||
confidence.enabled | Enable confidence reporting. Each field will report a confidence attribute with a value comprised between 0 and 10000. | false | |||
ambiguity.enabled | Enable ambiguity reporting. Reports a count of ambiguities. | false | |||
fields | Defines the fields to be returned. | DeviceClass, DeviceName, DeviceBrand, DeviceCpu, DeviceFirmwareVersion, DeviceVersion, OperatingSystemClass, OperatingSystemName, OperatingSystemVersion, OperatingSystemNameVersion, OperatingSystemVersionBuild, LayoutEngineClass, LayoutEngineName, LayoutEngineVersion, LayoutEngineVersionMajor, LayoutEngineNameVersion, LayoutEngineNameVersionMajor, LayoutEngineBuild, AgentClass, AgentName, AgentVersion, AgentVersionMajor, AgentNameVersion, AgentNameVersionMajor, AgentBuild, AgentLanguage, AgentLanguageCode, AgentInformationEmail, AgentInformationUrl, AgentSecurity, AgentUuid, FacebookCarrier, FacebookDeviceClass, FacebookDeviceName, FacebookDeviceVersion, FacebookFBOP, FacebookFBSS, FacebookOperatingSystemName, FacebookOperatingSystemVersion, Anonymized, HackerAttackVector, HackerToolkit, KoboAffiliate, KoboPlatformId, IECompatibilityVersion, IECompatibilityVersionMajor, IECompatibilityNameVersion, IECompatibilityNameVersionMajor, __SyntaxError__, Carrier, GSAInstallationID, WebviewAppName, WebviewAppNameVersionMajor, WebviewAppVersion, WebviewAppVersionMajor |
PutHBaseCell¶
Adds the Contents of a Record to HBase as the value of a single cell
Class¶
com.hurence.logisland.processor.hbase.PutHBaseCell
Tags¶
hadoop, hbase
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values , and whether a property supports the Expression Language .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
hbase.client.service | The instance of the Controller Service to use for accessing HBase. | null | |||
table.name.field | The field containing the name of the HBase Table to put data into | null | true | ||
row.identifier.field | Specifies field containing the Row ID to use when inserting data into HBase | null | true | ||
row.identifier.encoding.strategy | Specifies the data type of Row ID used when inserting data into HBase. The default behavior is to convert the row id to a UTF-8 byte array. Choosing Binary will convert a binary formatted string to the correct byte[] representation. The Binary option should be used if you are using Binary row keys in HBase | String (Stores the value of row id as a UTF-8 String.), Binary (Stores the value of the rows id as a binary byte array. It expects that the row id is a binary formatted string.) | String | ||
column.family.field | The field containing the Column Family to use when inserting data into HBase | null | true | ||
column.qualifier.field | The field containing the Column Qualifier to use when inserting data into HBase | null | true | ||
batch.size | The maximum number of Records to process in a single execution. The Records will be grouped by table, and a single Put per table will be performed. | 25 | |||
record.schema | the avro schema definition for the Avro serialization | null | |||
record.serializer | the serializer needed to i/o the record in the HBase row | kryo serialization (serialize events as json blocs), json serialization (serialize events as json blocs), avro serialization (serialize events as avro blocs), no serialization (send events as bytes) | com.hurence.logisland.serializer.KryoSerializer | ||
table.name.default | The table table to use if table name field is not set | null | |||
column.family.default | The column family to use if column family field is not set | null | |||
column.qualifier.default | The column qualifier to use if column qualifier field is not set | null |
RemoveFields¶
Removes a list of fields defined by a comma separated list of field names
Class¶
com.hurence.logisland.processor.RemoveFields
Tags¶
record, fields, remove, delete
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
fields.to.remove | the comma separated list of field names (e.g. ‘policyid,date_raw’ | null |
RunPython¶
!!!! WARNING !!!!
The RunPython processor is currently an experimental feature : it is delivered as is, with the current set of features and is subject to modifications in API or anything else in further logisland releases without warnings. There is no tutorial yet. If you want to play with this processor, use the python-processing.yml example and send the apache logs of the index apache logs tutorial. The debug stream processor at the end of the stream should output events in stderr file of the executors from the spark console.
This processor allows to implement and run a processor written in python. This can be done in 2 ways. Either directly defining the process method code in the script.code.process configuration property or poiting to an external python module script file in the script.path configuration property. Directly defining methods is called the inline mode whereas using a script file is called the file mode. Both ways are mutually exclusive. Whether using the inline of file mode, your python code may depend on some python dependencies. If the set of python dependencies already delivered with the Logisland framework is not sufficient, you can use the dependencies.path configuration property to give their location. Currently only the nltk python library is delivered with Logisland.
Class¶
com.hurence.logisland.processor.scripting.python.RunPython
Tags¶
scripting, python
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
script.code.imports | For inline mode only. This is the pyhton code that should hold the import statements if required. | null | |||
script.code.init | The python code to be called when the processor is initialized. This is the python equivalent of the init method code for a java processor. This is not mandatory but can only be used if script.code.process is defined (inline mode). | null | |||
script.code.process | The python code to be called to process the records. This is the pyhton equivalent of the process method code for a java processor. For inline mode, this is the only minimum required configuration property. Using this property, you may also optionally define the script.code.init and script.code.imports properties. | null | |||
script.path | The path to the user’s python processor script. Use this property for file mode. Your python code must be in a python file with the following constraints: let’s say your pyhton script is named MyProcessor.py. Then MyProcessor.py is a module file that must contain a class named MyProcessor which must inherits from the Logisland delivered class named AbstractProcessor. You can then define your code in the process method and in the other traditional methods (init...) as you would do in java in a class inheriting from the AbstractProcessor java class. | null | |||
dependencies.path | The path to the additional dependencies for the user’s python code, whether using inline or file mode. This is optional as your code may not have additional dependencies. If you defined script.path (so using file mode) and if dependencies.path is not defined, Logisland will scan a potential directory named dependencies in the same directory where the script file resides and if it exists, any python code located there will be loaded as dependency as needed. | null | |||
logisland.dependencies.path | The path to the directory containing the python dependencies shipped with logisland. You should not have to tune this parameter. | null |
SampleRecords¶
Query matching based on Luwak
you can use this processor to handle custom events defined by lucene queries a new record is added to output each time a registered query is matched
A query is expressed as a lucene query against a field like for example:
message:'bad exception'
error_count:[10 TO *]
bytes_out:5000
user_name:tom*
Please read the Lucene syntax guide for supported operations
Warning
don’t forget to set numeric fields property to handle correctly numeric ranges queries
Class¶
com.hurence.logisland.processor.SampleRecords
Tags¶
analytic, sampler, record, iot, timeseries
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
record.value.field | the name of the numeric field to sample | record_value | |||
record.time.field | the name of the time field to sample | record_time | |||
sampling.algorithm | the implementation of the algorithm | none, lttb, average, first_item, min_max, mode_median | null | ||
sampling.parameter | the parmater of the algorithm | null |
SelectDistinctRecords¶
Keep only distinct records based on a given field
Class¶
com.hurence.logisland.processor.SelectDistinctRecords
Tags¶
record, fields, remove, delete
SendMail¶
The SendMail processor is aimed at sending an email (like for instance an alert email) from an incoming record. There are three ways an incoming record can generate an email according to the special fields it must embed. Here is a list of the record fields that generate a mail and how they work:
- mail_text: this is the simplest way for generating a mail. If present, this field means to use its content (value) as the payload of the mail to send. The mail is sent in text format if there is only this special field in the record. Otherwise, used with either mail_html or mail_use_template, the content of mail_text is the aletrnative text to the HTML mail that is generated.
- mail_html: this field specifies that the mail should be sent as HTML and the value of the field is mail payload. If mail_text is also present, its value is used as the alternative text for the mail. mail_html cannot be used with mail_use_template: only one of those two fields should be present in the record.
- mail_use_template: If present, this field specifies that the mail should be sent as HTML and the HTML content is to be generated from the template in the processor configuration key html.template. The template can contain parameters which must also be present in the record as fields. See documentation of html.template for further explanations. mail_use_template cannot be used with mail_html: only one of those two fields should be present in the record.
If allow_overwrite configuration key is true, any mail.* (dot format) configuration key may be overwritten with a matching field in the record of the form mail_* (underscore format). For instance if allow_overwrite is true and mail.to is set to config_address@domain.com, a record generating a mail with a mail_to field set to record_address@domain.com will send a mail to record_address@domain.com.
Apart from error records (when he is unable to process the incoming record or to send the mail), this processor is not expected to produce any output records.
Class¶
com.hurence.logisland.processor.SendMail
Tags¶
smtp, email, e-mail, mail, mailer, sendmail, message, alert, html
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
debug | Enable debug. If enabled, debug information are written to stdout. | false | |||
smtp.server | FQDN, hostname or IP address of the SMTP server to use. | null | |||
smtp.port | TCP port number of the SMTP server to use. | 25 | |||
smtp.security.username | SMTP username. | null | |||
smtp.security.password | SMTP password. | null | |||
smtp.security.ssl | Use SSL under SMTP or not (SMTPS). Default is false. | false | |||
mail.from.address | Valid mail sender email address. | null | |||
mail.from.name | Mail sender name. | null | |||
mail.bounce.address | Valid bounce email address (where error mail is sent if the mail is refused by the recipient server). | null | |||
mail.replyto.address | Reply to email address. | null | |||
mail.subject | Mail subject. | [LOGISLAND] Automatic email | |||
mail.to | Comma separated list of email recipients. If not set, the record must have a mail_to field and allow_overwrite configuration key should be true. | null | |||
allow_overwrite | If true, allows to overwrite processor configuration with special record fields (mail_to, mail_from_address, mail_from_name, mail_bounce_address, mail_replyto_address, mail_subject). If false, special record fields are ignored and only processor configuration keys are used. | true | |||
html.template | HTML template to use. It is used when the incoming record contains a mail_use_template field. The template may contain some parameters. The parameter format in the template is of the form ${xxx}. For instance ${param_user} in the template means that a field named param_user must be present in the record and its value will replace the ${param_user} string in the HTML template when the mail will be sent. If some parameters are declared in the template, everyone of them must be present in the record as fields, otherwise the record will generate an error record. If an incoming record contains a mail_use_template field, a template must be present in the configuration and the HTML mail format will be used. If the record also contains a mail_text field, its content will be used as an alternative text message to be used in the mail reader program of the recipient if it does not supports HTML. | null |
SplitText¶
This is a processor that is used to split a String into fields according to a given Record mapping
Class¶
com.hurence.logisland.processor.SplitText
Tags¶
parser, regex, log, record
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
value.regex | the regex to match for the message value | null | |||
value.fields | a comma separated list of fields corresponding to matching groups for the message value | null | |||
key.regex | the regex to match for the message key | .* | |||
key.fields | a comma separated list of fields corresponding to matching groups for the message key | record_raw_key | |||
record.type | default type of record | record | |||
keep.raw.content | do we add the initial raw content ? | true | |||
timezone.record.time | what is the time zone of the string formatted date for ‘record_time’ field. | UTC |
Dynamic Properties¶
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description | EL |
---|---|---|---|
alternative regex & mapping | another regex that could match | this regex will be tried if the main one has not matched. It must be in the form alt.value.regex.1 and alt.value.fields.1 | true |
SplitTextMultiline¶
No description provided.
Class¶
com.hurence.logisland.processor.SplitTextMultiline
Tags¶
None.
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
regex | the regex to match | null | |||
fields | a comma separated list of fields corresponding to matching groups | null | |||
event.type | the type of event | null |
SplitTextWithProperties¶
This is a processor that is used to split a String into fields according to a given Record mapping
Class¶
com.hurence.logisland.processor.SplitTextWithProperties
Tags¶
parser, regex, log, record
Properties¶
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values .
Name | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
value.regex | the regex to match for the message value | null | |||
value.fields | a comma separated list of fields corresponding to matching groups for the message value | null | |||
key.regex | the regex to match for the message key | .* | |||
key.fields | a comma separated list of fields corresponding to matching groups for the message key | record_raw_key | |||
record.type | default type of record | record | |||
keep.raw.content | do we add the initial raw content ? | true | |||
properties.field | the field containing the properties to split and treat | properties |
Dynamic Properties¶
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description | EL |
---|---|---|---|
alternative regex & mapping | another regex that could match | this regex will be tried if the main one has not matched. It must be in the form alt.value.regex.1 and alt.value.fields.1 | true |