Are you ready: Logging

Showing posts with label Logging. Show all posts

Saturday, 15 July 2023

More choice using ChoiceFormat

The conversion of numbers to text is a frequently encountered problem for engineers. It manifests in various scenarios, and some of the most prevalent examples include:

Converting a number to a day of the week
Converting a number to a month
Transforming a status into user-friendly text

Now, let's explore some solutions for addressing this problem, using the conversion of a number to a day of the week as an explanatory example.

- IF/Else/Switch

public static String toDayOfWeek(int value) {

    if (value == 1) {
        return "MON";
    } else if (value == 2) {
        return "TUE";
    } else if (value == 3) {
        return "WED";
    } else if (value == 4) {
        return "THU";
    } else if (value == 5) {
        return "FRI";
    } else if (value == 6) {
        return "SAT";
    } else if (value == 7) {
        return "WED";
    }
    return "You are on moon";

}

While this solution may serve as a decent starting point, it is worth noting that even your grandfather might not approve of it.

- Maps

public static String toDayOfWeek(int value) {

    Map<Integer, String> dayOfWeek = new HashMap<>() {
        {
            int index = 1;
            put(index++, "MON");
            put(index++, "TUE");
            put(index++, "WED");
            put(index++, "THU");
            put(index++, "FRI");
            put(index++, "SAT");
            put(index++, "SUN");
        }
    };

    return dayOfWeek
            .getOrDefault(value, "You are on moon");

}

This solution appears to be an improvement and can be considered a good option, especially when the key is dynamic. However, it is important to note that maintaining this solution can become challenging over time.

- Enums

public enum DayOfWeek {
    MONDAY,
    TUESDAY,
    WEDNESDAY,
    THURSDAY,
    FRIDAY,
    SATURDAY,
    SUNDAY;
    private static final DayOfWeek[] ENUMS = DayOfWeek.values();

    public static DayOfWeek of(int dayOfWeek) {
        return ENUMS[dayOfWeek - 1];
    }
}

This approach seems quite elegant and aligns with the way JDK handles similar scenarios. It offers several advantages, such as leveraging data types to ensure compile-time safety. This not only enhances maintainability but also provides the benefit of catching errors during compilation. However, one potential drawback of this approach is that it can pose challenges when it comes to extending functionality due to the strong type safety constraints.

- Choice Format

The choice format is a relatively new feature introduced in JDK 17+, and it offers intriguing solutions for tackling this problem. Before delving into the intricacies of how it works, let's examine some code examples to get a better understanding.

public static String toDayOfWeek(int value) {
    double[] limits = {1, 2, 3, 4, 5, 6, 7};
    String[] formats = {"Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

This solution solves the problem by employing a straightforward approach: using pairs of arrays or vectors to map numbers to their corresponding strings.

At first glance, it may not seem particularly remarkable, resembling a Map structure where the keys are represented by one array and the values by another. However, the true test begins when we pass values that are not present in the array.

Now, let's speculate. What do you think this function would return if we were to pass 0 or 100? One possibility could be "Undefined," which would hold true if this were JavaScript. Another option could be null or a NullPointerException, which is a close guess given your familiarity with Java. However, this is where the solution gets interesting.

Lets look at the output for 1/2/0/100

1 -> Mon 2 -> Tue 0 -> Mon 100 -> Sun

The output obtained from this solution provides some valuable insights into how ChoiceFormat operates. For example, when we pass 0, it returns "Monday," and when we pass 100, it returns "Sunday."

Based on these results, you might start getting an idea of how ChoiceFormat functions. It relies on a few key elements:

Ascending limits array: The limits array is arranged in ascending order, defining the intervals.
Format array: This array has the same size as the limits array and contains the corresponding text representations for each interval.
Interval Behaviour: This values in limit array represent half-open interval, meaning that the lower limit is inclusive while upper limit is exclusive.

These factors play a crucial role in determining the appropriate text representation based on the input value within the defined intervals.

Lets look at half-open interval match with below function

public static String weekDayOrWeekend(int value) {
    double[] limits = {1, 6};
    String[] formats = {"WeekDay", "Weekend"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

This particular implementation exhibits an interesting behavior: for values less than 5, the function returns "Weekday," while for values 6 and above, it returns "Weekend."

Isn't it fascinating how ChoiceFormat manages to accomplish this range-based search and deliver the appropriate result? It's remarkable how this small utility class can perform such a useful trick.

Let's consider one more simple example before delving into the greater capabilities of this utility class.

public static String workDays(int value) {
    double[] limits = {1, 2, 5, 6};
    String[] formats = {"Monday Blues", "WorkHard", "It is Friday!!", "Relax"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

This should give fair bit of idea that this class will be useful in many places.

Lets look at some application

- Better log messages

public static String files(int value) {
    double[] limits = {0, 1, 2};
    String[] formats = {"No files", "One files", "Many files"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

System.out.println("Found " + files(100)); //Found Many files
System.out.println("Found " + files(0));//Found No files
System.out.println("Found " + files(1));//Found One files

- Conditional Log message

public static String formatMessage(String format, int value) {
    ChoiceFormat form = new ChoiceFormat(format);
    return form.format(value);
}

String format = "0#no files | 1#one file |2# two files |3< more than 2 ";
System.out.println(formatMessage(format, 2)); //two files
System.out.println(formatMessage(format, 10)); //more than 2
System.out.println(formatMessage(format, 0)); //no files
System.out.println(formatMessage(format, 1)); //one file

This example showcases the power of advanced string interpolation by utilizing rules embedded within the format string.

By leveraging this technique, we can define rules directly within the format string itself, which provides a flexible and concise approach to handle various scenarios.

- Parameterised Conditional Log message

Multiple ways to do this, lets look at few example.

public static ChoiceFormat usingPair() {
    double[] priceLimits = {0.0, 10.0, 50.0, 100.0};
    String[] priceFormats = {
            "The item is not available",
            "The item is on sale for {0}",
            "The item is moderately priced at {0}",
            "The item is expensive at {0}"
    };
    return new ChoiceFormat(priceLimits, priceFormats);
}

public static ChoiceFormat usingStringLiteral() {
    return new ChoiceFormat(
            "0#The item is not available |10#The item is on sale for {0} |50#The item is moderately priced at {0} |100#The item is expensive at {0}");
}

public static ChoiceFormat usingStringRules() {

    String rules = String.join(" |",
            "0#The item is not available",
            "10#The item is on sale for {0}",
            "50#The item is moderately priced at {0}",
            "100#The item is expensive at {0}");

    return new ChoiceFormat(rules);
}

ChoiceFormat can be created using any of the methods mentioned above, each with its own advantages and considerations. However, some methods may be easier to maintain and less error-prone than others.

Among the options, if I were to choose one, I would prefer using the last method demonstrated, which involves using string rules. This method provides greater flexibility and simplicity in defining the rules for the ChoiceFormat. By using string rules, you can easily specify the mappings between input values and their corresponding text representations in a concise and readable manner. This approach often results in code that is easier to understand, modify, and maintain.

Above format can be used as below

ChoiceFormat priceFormat = usingStringRules();

double price = 120;
Object[] formatArguments = {price};
String formattedPrice = MessageFormat.format(priceFormat.format(price), formatArguments);
System.out.println(formattedPrice); // The item is expensive at 120

Just imaging how this capability can be used by logging framework !

ChoiceFormat empowers to amplify the range of choices available for message formatting. By incorporating ChoiceFormat into your code, you can introduce a multitude of options, enriching the formatting possibilities.

The versatility of ChoiceFormat allows you to define and customize a wide array of choices, each with its own designated format. This flexibility enables you to create dynamic and adaptive messages that cater to different input values.

With ChoiceFormat at your disposal, you can enhance your message formatting capabilities, opening up new avenues for crafting comprehensive and adaptable output.

Saturday, 26 May 2018

Custom Logs in Apache Spark

Have you ever felt the frustration of Spark job that runs for hours and it fails due to infra issue.
You know about this failure very late and waste couple of hours on it and it hurts more when Spark UI logs are also not available for postmortem.

You are not alone!

In this post i will go over how to enable your own custom logger that works well with Spark logger.
This custom logger will collect what ever information is required to go from reactive to proactive monitoring.
No need to setup extra logging infra for this.

Spark 2.X is based using Slf4j abstraction and it is using logback binding.

Lets start with logging basic, how to get logger instance in Spark jobs or application.

val _LOG = LoggerFactory.getLogger(this.getClass.getName)

It is that simple and now your application is using same log lib and settings that Spark is based on.

Now to do something more meaningful we have to inject our custom logger that will collect info and write it to Elastic search or Post to some REST endpoint or sends alerts.

lets go step by step to do this

Build custom log appender
Since spark 2.X is based on logback, so we have to write logback logger.

Code snippet for custom logback logger

This is very simple logger which is counting message per thread and all you have to do it override append function.

Such type of logger can do anything like writing to database or sending to REST endpoint or alerting .

Enable logger
For using new logger, create logback.xml file and add entry for new logger.
This file can be packed in Shaded jar or can be specified as runtime parameter.

Sample logback.xml
This config file adding MetricsLogbackAppender as METRICS

<appender name="METRICS" class="micro.logback.MetricsLogbackAppender"/>

Next enabling it for package/classes that should use this

<logger level="info" name="micro" additivity="true">    <appender-ref ref="METRICS" /></logger>
<logger level="info" name="org.apache.spark.scheduler.DAGScheduler" additivity="true">    <appender-ref ref="METRICS" /></logger>

You are done!

Any message logged from 'micro' package or from DAGScheduler class will be using new logger .
Using this technique executor logs can be also capture and this becomes very useful when spark job is running on hundred or thousands of executor.

Now it opens up lots of option of having BI that shows all these message at real time, allow team to ask interesting questions or subscribe to alters when things are not going well.

Caution : Make sure that this new logger is slowing down application execution, making it asynchronous is recommended.

Get the insight at right time and turn it to action

Code used in this blog is available @ sparkmicroservices repo in github.

I am interested in knowing what logging patterns you are using for Spark.

Tuesday, 25 December 2012

Are you using correct logging framework ?

Logging framework is heart of every application, it helps in troubleshooting production issue,knowing how your application is being use,what are bottle neck in application and may more.

Using right logging framework is key, Wikipedia has list of famous one in java world.

So logging has lot of benefit but it also brings lot of overhead ,so you do trade-off for benefit that you get. It is interesting to measure cost of overhead and we all know it has big overhead because it is I/O bound, so we come with best strategy on what to log, how much to log, which one is best framework etc.

One problem with current logging framework is that it , not much work has been done on performance improvement, especially if you talk about taking advantage of multi core architecture. Application try to do more work to take advantage of multiple cores and that may result into more..... log message and since log framework are not their yet, so they don't scale.

Cost Of Log Message

Lets measure time spent in logging.
I wrote java program that sums the number of element in array and it is done in parallel. Array is divided in chunk and each task sums that chunk and log message before & after it process chunk. I have taken simple computation problem because we want to measure cost of logging.

For bench marking purpose , i do calculation with/without logging to check how much time we spend on logging and numbers are crazy.

Java JDK 1.4 logging framework(java.util.logging) is used for benchmark, i only took java one because all other like log4j , apache logging , SLFJ etc are almost same when they log message to file.

Details of Machine
OS : Windows 7, 64 bit
Processor : Intel i5, 2.40 GHz
No Of Cores : 4

Numbers are really crazy, once you add logging, performance drops by 6X to 10X times, it is not worth it to use logging.

More time is spent in logging than doing real work, but fact of life is we still need logging and if it is not there then it will be nightmare to troubleshoot production issue.

Why it is so slow!

To find out why is super slow, we have to dive into the code, so lets have look at default handler(FileHandler) that java logging framework provides.

FileHandler has synchronized function that perform core logging and we know what type of performance degradation you can get in multi threaded env, so that is number 1 reason for slowness.

public synchronized void publish(LogRecord record) {

if (!isLoggable(record)) {

return;

}

super.publish(record);

flush();

.........

..........

...........

}

Other reason for slowness is that I/O operation is performed for each and every call of log message, there is no buffering done, not sure why it is done like that may be to keep it simple and all these simplicity adds to performance cost.

All of these techniques result in lot of contention and we see big degrade in performance by harm-less looking code.

What can we do now ?

So we have to find some alternate framework that does't make our application 10X times slow!

We have to do couple of quick things.

- Get rid of synchronized

- Try to add some I/O buffering

- Use Async for logging

So i did add these stuff and created simple logging util and measured the cost of same.

So the light green one is the logger that has some of improvement and it is amazing to see that we are back to same performance and it is not adding any significant overhead, it is almost as there is no logging in code.

So by just switching logging framework we can get up to 6X to 10X benefit.

What gives up these performance improvement

- ConcurrentLinkedQueue is used to store all the log message, ConcurrentLinkedQueue is lock free queue but it is not bounded. Bouded queue can be used to reduce risk where producer is faster than consumer.

- Buffering is added for I/O operation, these reduces I/O operation

- Prealocated bytebuffer is used to keep all the message that needs to writen to file.

- Lock based wait strategy is used when there is no message in queue, better waiting strategy can be used based on requirement, but for logs i think lock based are fine.

Conclusion

So time has come to re-think logging framework used, just using by using correct logging framework we can get big performance boost. To make application more responsive, we have to have to look for alternate. This is very important in low latency & high throughput space.

Example that i used is very basic, it does't have all the features, but can be easily extended to add those stuff.

Link To Code

Code available @ github