LangChain4j AIServices Gotcha

LangChain4j AIServices Gotcha

I recently filed a bug ticket for quarkus-langchain4j. It's possible that I'm "holding it wrong", but even if this does not turn out to be a bug I want to document it, in case anyone else is tripping over this. The complete code I'm going to describe can be found here.

In a previous post, I talked about how excited I was about langchain4j. And I still am, but as I venture deeper I've hit some behaviors I didn't expect based on my read of the docs. First, I'll implement an example that works as expected. I've implemented my examples as tests, simply to make them easier to run in IntelliJ. The working example has three files:

package com.wininger.working;

import dev.langchain4j.service.UserMessage;

public interface WorkingService {
  public WorkingServiceResponse converse(@UserMessage String prompt);
}
package com.wininger.working;

import dev.langchain4j.model.output.structured.Description;

public record WorkingServiceResponse(
  @Description("A response to the users question") String response
) {}
package com.wininger.working;

import dev.langchain4j.model.chat.request.ResponseFormat;
import dev.langchain4j.model.chat.request.ResponseFormatType;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.output.JsonSchemas;
import io.quarkus.test.junit.QuarkusTest;
import org.junit.jupiter.api.Test;

@QuarkusTest
public class WorkingTest {
  @Test
  void showsTheSystemWorkingAsExpected() {
    final var model = buildModel();
    final var service = AiServices.builder(WorkingService.class)
      .chatModel(model)
      .chatRequestTransformer(req -> {
        // show always be 1
        System.out.println("num messages: " + req.messages().size());
        return req;
      })
      .build();

    // tell it your name
    service.converse("Hello my name is bob");

    final var result = service.converse("What is my name?");

    // It should not know
    System.out.println("result: " + result);
  }

  private OllamaChatModel buildModel() {
    final var responseFormat = ResponseFormat.builder()
      .type(ResponseFormatType.JSON)
      .jsonSchema(JsonSchemas.jsonSchemaFrom(WorkingServiceResponse.class).get())
      .build();

    return OllamaChatModel.builder()
      .modelName("gemma3:4b")
      .baseUrl("http://localhost:11434/")
      .responseFormat(responseFormat)
      .build();
  }
}

In particular notice this bit:

    final var service = AiServices.builder(WorkingService.class)
      .chatModel(model)
      .chatRequestTransformer(req -> {
        // show always be 1
        System.out.println("num messages: " + req.messages().size());
        return req;
      })
      .build();

    // tell it your name
    service.converse("Hello my name is bob");

    final var result = service.converse("What is my name?");

    // It should not know
    System.out.println("result: " + result);

We create a new service based on WorkingService interface. I've attached a chatRequestTransformer so we can inspect the messages being sent to the model. We will invoke two calls to converse on our service. The first call states, "Hello my name is bob" and the second call asks "What is my name?". Out of the box, these services should not retain any state so our model should have no memory. If we run this test we'll find that's true, each time the chatRequestTransformer logs num messages: 1 and the model explains that it does not know our name.

If we change one small thing this breaks down. The documents indicate that you can annotate your service methods with templated user messages or system messages. Let's create a version of our service that adds some surrounding context to our user messages.

package com.wininger.not_working;

import com.wininger.working.WorkingServiceResponse;
import dev.langchain4j.service.UserMessage;

public interface NotWorkingService {
  @UserMessage("Answer the question asked {{it}}")
  public WorkingServiceResponse converse(String prompt);
}

If we run the same test using this version of the service, something very strange happens:

num messages: 1
num messages: 3
result: WorkingServiceResponse[response=Your name is Bob.]

I've also reproduced by simply adding a SystemMessage annotation. This makes the generated services very hard to use in production. So far, the workaround I've found for my image labeling app is to add a request transformer that drops old messages:

final ImageInfoFromDescriptionService service = 
	AiServices.builder(ImageInfoFromDescriptionService.class)
        .chatModel(imageInfoFromDescriptionModel)
        .chatRequestTransformer(req -> {
          if (req.messages().size() <= 2) {
            return req;
          }

          // work around
          System.out.println("dropping stale messages");
          List<ChatMessage> trimmedMessages = req.messages().subList(
              req.messages().size() - 2,
              req.messages().size()
          );

I'll update this blog if I hear back from the project. In the meantime, I hope this helps someone.