From the course: Low Code/No-Code Data Literacy with KNIME: From Basic to Advanced

Data cleaning with KNIME: How to filter rows

(mellow music) - In this video, you will learn how to filter rows from a table in KNIME Analytics Platform using various criteria. Let's jump into it. You have a sales table where each row contains the data for a sales contract. There is the sold product, the country of sale, the date of the contract, the number of items, the amount of money generated by this contract, and whether the transaction was paid by credit card or not. Some columns are of type string and some columns are of type integer. Some other columns also have missing values, like here, for example. First, you want to extract all the sales made in a specific country. Thus, open the configuration of the row filter and specify as the column to test the column country. You can now specify a pattern to be matched for this column, for example, Germany. Close the configuration and execute the node. The output provides all the rows matching the specified pattern, Germany, in the selected column, country. You can also specify more complex patterns by making use of regular expressions and wildcards. For example, specify U* as a pattern and select the wildcard flag to filter all the data rows whose country starts with the letter U. Note that you can also reverse the filtering. Select exclude rows by attribute value to select, in this case, all the data rows whose country does not start with the letter U. Now, let's filter rows based on their values into the integer columns. For example, you want to extract sales records according to their quantity, which is the number of sold products. Change the column to test to quantity and use the range checking criteria to specify a lower bound. For example, 2. The row filter now selects only the data rows whose quantity is greater or equal than 2. To select the rows where quantity is exactly 2, you can set 2 in the upper bound as well. Intuitively, you can also specify a higher upper bound, for example, 6. The result now contains all the sales data whose amount is between 2 and 6. As for the pattern matching, you can reverse the filtering and exclude the rows where quantity is in the defined range. Sometimes it is necessary to select all data rows where a field is missing. Let's select as the column to test the column card and select the last matching criterion only missing value match. Similar as before, you can choose to include all the rows with missing value or exclude them. In some cases, you know exactly which row you want to keep. In this case, there is no need for a matching criteria. It's enough to say from row X to row Y. This is possible by selecting, include or exclude rows by number and entering the decide range. For example, from row number 5 to row number 10. Finally, if you want to keep both included and excluded rows, you can use the row splitter node. This node has exactly the same configuration window as the row filter, but features two output ports, one for the included and one for the excluded rows. In this video, you have learned how to filter rows in analytics platform using various criteria. Note that if you want to perform more complex filtering or are dealing with other data types, you can use more advanced nodes such as the rule-based rule filter, the reference rule filter, and the date and time based rule filter. (upbeat music)

Contents