Ansible collection for agentless VM application discovery — modular runtime detectors with structured JSON output

Find a file

anaeem 7ac10c78e8 Some checks failed APME static analysis / apme-check (push) Has been cancelled Details chart publish / chart-publish (push) Has been cancelled Details contract / unit-tests (push) Has been cancelled Details lint / hygiene (push) Has been cancelled Details render-validate / render-validate (push) Has been cancelled Details live-E2E: fix 2 more deploy bugs (CREATE DATABASE loop, Spring Boot datasource env) Full live run of pricing-svc through the bulletproofed pipeline caught two more: - provision_db CREATE DATABASE used a changed_when referencing bare rc/stdout which errors per-item inside a loop -> the task FAILED even though every CREATE succeeded. Switched to changed_when: true (idempotent; default failed_when still surfaces a real psql error) - the chart wired only generic DB_n_URL/USER/PASSWORD env, which Spring Boot does NOT read -> a migrated fat-jar fell back to its baked-in localhost datasource and shut down at startup. Chart 0.2.7 now aliases the first jdbc secret into SPRING_DATASOURCE_URL/USERNAME/PASSWORD for springboot_jar runtimes only RESULT: pricing-svc deployed end-to-end via the new code (fresh discover -> render -> source build -> build_run) is 1/1, /actuator/health db UP, /api/price returns real CRUD (70.00). 544 unit tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>		2026-06-13 05:08:21 +01:00
.devcontainer	Swap.2: Replace LiteLLM endpoint + model + auth with OpenRouter	2026-05-11 15:35:27 +01:00
.gitea/workflows	P3: multi-datasource + seed-once + credential-leak + chart fixes + blocking render CI	2026-06-13 03:37:04 +01:00
bootstrap/anaeem	docs: document AAP 2.6 unified-gateway + Gitea CI runner (end-to-end proven)	2026-06-11 20:36:58 +01:00
context	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
docs	P5: adversarial hardening + completeness sweep + handoff docs	2026-06-13 04:24:05 +01:00
meta	docs: document AAP 2.6 unified-gateway + Gitea CI runner (end-to-end proven)	2026-06-11 20:36:58 +01:00
playbooks	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
plugins	refactor(mta): foundation Python modules; rewrite phases 2 + 4 to use them	2026-05-24 19:14:31 +01:00
provision	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
roles	live-E2E: fix 2 more deploy bugs (CREATE DATABASE loop, Spring Boot datasource env)	2026-06-13 05:08:21 +01:00
scripts	Make pipeline operational end-to-end (analyzer, template, autopair, docs)	2026-05-18 12:01:30 +01:00
tests	live-E2E: fix 3 real integration bugs the fixtures missed	2026-06-13 04:48:57 +01:00
.ansible-lint	Tier 2: Lint clean (0 failures), requirements.yml fix, var-naming doc	2026-05-11 00:29:29 +01:00
.apme.yml	Merge apme/l022-pipefail into v5-readability-bundle	2026-05-23 16:46:47 +01:00
.env.example	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
.gitignore	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
AGENTS.md	apme: wire .apme.yml + REST-based CI gate + agent docs	2026-05-23 00:25:06 +01:00
ansible.cfg	refactor(mta): foundation Python modules; rewrite phases 2 + 4 to use them	2026-05-24 19:14:31 +01:00
CHANGELOG.md	docs: document AAP 2.6 unified-gateway + Gitea CI runner (end-to-end proven)	2026-06-11 20:36:58 +01:00
CONTRIBUTING.md	P5: adversarial hardening + completeness sweep + handoff docs	2026-06-13 04:24:05 +01:00
execution-environment.yml	P0: bulletproof foundation — untrack .ansible mirror, lint.yml tripwires, tests/unit + EE test tooling	2026-06-13 02:25:32 +01:00
galaxy.yml	v3.0.0: Tomcat→MTA→LLM→validate pipeline driven by AAP workflow	2026-05-20 19:42:19 +01:00
inventory.ini.example	Stress-test fixes: scoring, Jinja guards, secrets, lint, docs	2026-05-09 22:36:42 +01:00
inventory.test-fleet.ini	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
inventory.test-fleet.ini.example	feat: VM→OpenShift build/run pipeline — Helm bundles, generated tests, MTA loop, post-merge deploy	2026-06-11 19:16:57 +01:00
Makefile	P5: adversarial hardening + completeness sweep + handoff docs	2026-06-13 04:24:05 +01:00
README.md	docs: extend all end-to-end and AAP diagrams through merge → runner → JT 133 → running pod	2026-06-11 20:48:24 +01:00
requirements.yml	ci(aap): stop re-fetching EE-bundled collections on project sync (galaxy flakiness)	2026-06-11 20:02:52 +01:00

README.md

migration.discovery — Java VM to OpenShift in three roles

An Ansible collection that takes a Linux VM running a Java application and produces an OpenShift workload bundle (Containerfile + Helm chart values + tests) as a gitea pull request, then helm-installs and verifies it on the target cluster.

The pipeline is three roles:

discover — SSH onto the host; find Java apps; emit one fact file per app.
containerize — render a Containerfile + Helm values.yaml (for the published migrated-app chart) + a generated test suite + MTA analysis, all from one fact file.
deploy — push the bundle as a gitea PR on bundle/<slug>.

After the PR merges, build_run helm-installs the app into the migrated-apps namespace, provisions a per-app database on shared PostgreSQL, and verifies the running workload.

Everything else (fleet readiness reports, MTA inventory feed, Konveyor analyzer, SBOM, drift detection) is an add-on that consumes the same fact files. See docs/architecture/overview.md for the full picture.

Scope: Java applications only. Other runtimes (Node.js, Python, .NET, Go) are on the roadmap but not yet implemented.

Documentation Map

Document	Description
CHANGELOG.md	Version history and release notes
AGENTS.md	Coding conventions for AI and human contributors
Architecture
docs/architecture/overview.md	Authoritative architecture: three roles, the fact-file contract, the bundle layout, the Helm chart, MTA closed loop, build/run leg, AAP flow
docs/architecture/	Supplementary Mermaid diagrams (fleet report, detector extensibility, AAP integration, verdict decision tree)
docs/v5-showroom/	Top-down Showroom site (Antora/AsciiDoc) — for tech and non-tech readers
docs/SBOM.md	CycloneDX SBOM generation via syft
docs/SECURITY.md	Validator account lifecycle, secret hygiene, LLM redaction rules
Test fleet
docs/TEST_FLEET.md	Four-app test fleet (classic-shop, report-factory, ledger-ejb, pricing-svc) and provisioning
Runbooks
docs/runbooks/00-prereqs.md	Prerequisites before running anything
docs/runbooks/01-readiness-report.md	Generate the fleet readiness HTML
docs/runbooks/03-aap-setup.md	First-time AAP wiring (projects, inventories, credentials)
docs/runbooks/04-workflow-template.md	AAP workflow walkthrough (discover → fan-out → per-app Survey → deploy)
docs/runbooks/05-drift-detection.md	Post-deployment drift detection
docs/runbooks/06-decommission-97-98-107.md	Decommissioning legacy JT IDs
docs/runbooks/07-build-run.md	Post-merge build/run: helm-install, DB provisioning, verify
docs/runbooks/08-provision-fleet.md	Provision and manage the test-fleet KubeVirt VMs
docs/runbooks/09-aap-and-ci.md	AAP 2.6 access model (gateway vs controller), controller OAuth2 token, Gitea runner setup
Bootstrap
bootstrap/anaeem/AGENTS.md	Target-cluster one-shot bootstrap (namespace, postgres, OCI pull secret)
Archive
docs/archive/ARCHITECTURE_V3.md	Older architecture (kept for history)
docs/archive/ARCHITECTURE_CRITIQUE.md	Critique that motivated the current design

Quickstart

cat > inventory.ini <<EOF
[all]
vm1 ansible_host=10.0.1.10

[all:vars]
ansible_user=deploy
ansible_become=true
EOF

# 1. Scan the VM — emits <slug>.facts.json per detected app + discovered_apps.json
ansible-playbook -i inventory.ini playbooks/discover.yml

# 2. For one discovered app, pair to a source repo, containerize, push as a PR
ansible-playbook playbooks/per_app.yml \
  -e slug=vm1-tomcat \
  -e source_repo=https://git.arsalan.io/anaeem/my-app.git \
  -e source_branch=main

After step 2, a gitea PR exists on branch bundle/vm1-tomcat with:

Containerfile — LLM-authored, runtime-aware, non-root (UID 185)
values.yaml — input to the published migrated-app Helm chart
Chart.ref.yaml — chart OCI ref + version pin
rendered/ — helm template preview
tests/ — container-structure-test, goss, http_checks, run_local.sh
MTA_ANALYSIS.md — mandatory-issue gate verdict
VERIFICATION.md — hadolint + helm lint + kubeconform evidence

The end-to-end flow under AAP

[RHDH step1]  ->  JT 107         ->  playbooks/discover.yml          (1 run, against inventory)
[AAP / EDA]   ->  fan_out        ->  playbooks/utils/fan_out.yml     (launches JT 98 per app)
[per-app Survey: source_repo, source_branch, source_subpath]         (the human gate)
[JT-perapp]   ->  per_app.yml   ->  pair -> containerize -> deploy   (Gitea PR)

[PR merge]    ->  build-on-merge.yml   (Gitea Actions runner, anaeem SNO)
                   docker build + push image to Harbor
                   POST launch → AAP JT 133

[JT 133]      ->  playbooks/build_run.yml  (EE: migration-ee with helm+oc)
                   helm-install -> provision DB -> helm test -> Route /health

The Gitea runner is a self-hosted gitea/act_runner:nightly-dind pod in namespace gitea-ci on the anaeem SNO (bootstrap: bootstrap/anaeem/20-gitea-act-runner.yaml). JT 133 talks to the AAP controller via its direct route (aap-controller-aap.apps.hammer.na-launch.com) — the gateway URL returns 404 for /api/v2/*. Auth is a controller-native OAuth2 token.

See docs/runbooks/09-aap-and-ci.md for the full access model and runner setup, and docs/architecture/overview.md for the complete architectural picture.

Add-on: fleet readiness report

For an HTML report with Green/Yellow/Red verdicts across the estate (no manifest generation, no PRs), run the fleet-report add-on after discover:

ansible-playbook -i inventory.ini playbooks/addons/fleet-report.yml \
  -e fleet_report_customer_name="Acme Corp" \
  -e fleet_report_engagement_name="Java Migration Assessment"

Outputs migration-readiness-report.html and fleet-report-data.json.

What This Collection Does

Point this collection at any Linux VM (via SSH) and it will:

Inventory the OS, hardware, packages, services, processes, and listening ports
Discover all installed Java environments (JDK/JRE locations, versions, vendors)
Match findings against a library of Java application detector definitions
Extract configuration files and runtime details for each detected application
Perform deep inspection: JVM analysis, keystore enumeration, JDBC parsing, WAR introspection, build file analysis
Score each host for migrate-ability (Green / Yellow / Red)
Produce per-host JSON reports, a fleet HTML report, and an MTA handoff section for downstream migration tooling

Supported Java application detectors:

Application	Category	Key Detection Signals	Status
Apache Tomcat	Application Server	catalina process, port 8080/8443	Validated
WildFly / JBoss EAP	Application Server	jboss-modules.jar, port 9990	Validated
Spring Boot	Application Server	JarLauncher process, spring .jar	Experimental
Oracle WebLogic	Application Server	weblogic.Server, port 7001/7002	Experimental
IBM WebSphere Traditional	Application Server	WsServer, port 9080/9443	Experimental
IBM WebSphere Liberty	Application Server	ws-server.jar, port 9080/9443	Experimental

Experimental detectors have detection patterns defined but have not been validated against a live instance. Use at your own risk and verify results manually.

Architecture

  inventory host
        |
        v SSH + become
+----------------+   <slug>.facts.json   +-------------------+   manifests/<slug>/   +-------------+
|   discover     | --------------------> |   containerize    | --------------------> |   deploy    |
+----------------+  discovered_apps.json +-------------------+                       +-------------+
                     (fan-out manifest)   Containerfile, values.yaml,                      |
                     → fan_out.yml        Chart.ref.yaml, rendered/,                       v
                     → JT-perapp-analyze  tests/, MTA_ANALYSIS.md          Gitea PR bundle/<slug>
                       (Survey gate)           VERIFICATION.md             + verify-bundle CI
                                                                                           |
                                                                                 [Human merges PR]
                                                                                           |
                                                                                           v
                                                                            Gitea Actions runner (anaeem SNO)
                                                                            build image from source
                                                                            push → oci.arsalan.io/migrated-apps/<slug>
                                                                            POST launch → AAP JT 133
                                                                                           |
                                                                                           v
                                                                            JT 133 build_run (migration-ee helm+oc)
                                                                            helm upgrade --install (chart from Harbor)
                                                                            provision per-app DB on shared PostgreSQL
                                                                            helm test + Route /health
                                                                                           |
                                                                                           v
                                                                            pod running on anaeem SNO
                                                                            namespace: migrated-apps
                                                                            served via OpenShift Route

The one data contract: <slug>.facts.json, emitted by discover, read by containerize and every add-on. Schema is documented in docs/architecture/overview.md.

Role: discover

SSH onto a host; gather OS/process/filesystem/Java facts; detect apps via YAML detector definitions; extract config (capped 1 MB, redacted); deep-inspect each app (JVM, keystores, JDBC, environment, log config, cron jobs, build info); emit one <slug>.facts.json per high/medium-confidence detection plus discovered_apps.json.

Deep inspection captures: process user/command, PID ports, JVM heap/GC/agents, keystores (metadata only — no key material), JDBC URLs, environment variables, log config, cron jobs referencing the app, build info, and secrets markers. Redaction is unconditional.

Role: containerize

Given one <slug>.facts.json, run:

render.yml — derive struct from fact file; render values.yaml (for the published migrated-app Helm chart) + Chart.ref.yaml + rendered/ preview.
testgen.yml — generate tests/ suite (container-structure-test, goss, http_checks, build_context.yaml, run_local.sh).
fetch_mta_results.yml — poll Konveyor hub for the app's taskgroup; fetch insights, dependencies, tags into fact variables for the LLM prompt and gate.
llm_containerfile.yml — call OpenRouter (claude-sonnet-4.6) with discovery facts + MTA mandatory issues; write Containerfile.
mta_gate.yml — write MTA_ANALYSIS.md; emit mta_gate set_stats artifact.
validate.yml — hadolint + helm lint + helm template | kubeconform.
verify_evidence.yml — write VERIFICATION.md.

For source-mode builds the Containerfile uses a multistage build that clones the source repo.

Role: deploy

Push manifests/<slug>/ to the target gitea repo on bundle/<slug> and open (or update) a PR.

Role: build_run

Post-merge orchestration: read_bundle (clone migration-targets) → gate (MTA mandatory-issue gate) → helm_deploy (helm upgrade --install the published migrated-app chart) → provision_db (idempotent per-app role + database on shared PostgreSQL, patches chart-owned Secret, seeds db/*.sql, rollout-restart) → verify (helm test + Route GET + evidence + PR comment).

Role: fleet_report

Scores every host Green/Yellow/Red and produces a self-contained HTML report (migration-readiness-report.html) and fleet-report-data.json.

Role: drift_detect

Compares JVM args, environment variables, ports, JDBC connections, and system packages between a source VM and its containerized counterpart.

ansible-playbook playbooks/addons/drift.yml \
  -e drift_detect_baseline_file=path/to/old-fleet.json \
  -e drift_detect_current_file=path/to/new-fleet.json

Usage

Prerequisites

Ansible 2.16+ on the control node
helm 3.x (auto-bootstrapped into /tmp/discovery-bin/ by ensure_tools.yml)
SSH access to target hosts with become privileges
OpenRouter API key (for the LLM Containerfile authoring step)
syft (optional, for SBOM generation)

Install the collection

# From the repository
ansible-galaxy collection install git+https://git.arsalan.io/anaeem/ansible-collection-discovery.git

# Or from a local checkout
cd /path/to/ansible-collection-discovery
ansible-galaxy collection build
ansible-galaxy collection install migration-discovery-*.tar.gz

Run discovery

# Against a single host
ansible-playbook -i "target-vm," playbooks/discover.yml

# Against an inventory
ansible-playbook -i inventory.ini playbooks/discover.yml

# With custom output directory
ansible-playbook -i inventory.ini playbooks/discover.yml -e discover_output_dir=/opt/reports

AAP Integration

AAP Resource	Details
Project 96	"Discovery Collection" — SCM `git.arsalan.io/anaeem/ansible-collection-discovery` @ `main`; `scm_update_on_launch: false` (sync manually)
JT 107	`playbooks/discover.yml` — scan inventory, emit fact files
JT-E fan_out	`playbooks/utils/fan_out.yml` — launch JT 98 once per discovered app
JT 98	`playbooks/per_app.yml` — Survey (source_repo / source_branch) → pair → containerize → deploy
JT 133	`playbooks/build_run.yml` — post-merge helm-install + DB provision + verify (triggered by Gitea runner)
EE 6	"migration-ee (helm+oc)" — custom EE with `helm`+`oc` bundled (required for JT 133)
Workflow	`migration-discovery-e2e` — JT 107 → review gate → JT-E → N × JT 98

Controller API base: https://aap-controller-aap.apps.hammer.na-launch.com/api/v2/ (OAuth2 Bearer token). See docs/runbooks/09-aap-and-ci.md for the access model. See docs/runbooks/04-workflow-template.md for the full walk-through.

Inventory example

[java_servers]
tomcat01.example.com
jboss01.example.com
weblogic01.example.com

[all:vars]
ansible_user=deploy
ansible_become=true

How to Add a New Detector

Adding a new Java application detector requires only creating a single YAML file in roles/discover/files/detectors/. No code changes are needed. The file is loaded automatically on the next playbook run.

Detector Schema Reference

Every field explained:

# ==============================================================
# REQUIRED FIELDS
# ==============================================================

name: "Human Readable Name"
# Display name shown in reports and logs.
# Example: "Apache Tomcat", "Oracle WebLogic"

id: "snake_case_id"
# Unique identifier used as dictionary keys in Ansible facts.
# Must be unique across all detectors. Use lowercase with underscores.
# Example: "tomcat", "jboss_wildfly", "websphere_liberty"

category: "application_server"
# Classification for grouping in reports.
# Values: application_server, web_server, database, runtime, middleware, monitoring

# ==============================================================
# DETECTION RULES (at least one section should have entries)
# ==============================================================

detect:
  processes:
  # List of regex patterns matched against `ps auxww` command lines.
  # Each entry has a single "pattern" key containing a regex string.
  # Backslash-escape dots in package names: "org\\.apache" not "org.apache"
    - pattern: "org\\.apache\\.catalina\\.startup\\.Bootstrap"
    - pattern: "-Dcatalina\\.home="

  packages:
  # List of regex patterns matched against installed RPM/DEB package names.
    - pattern: "^tomcat"
    - pattern: "^java.*openjdk"

  services:
  # List of regex patterns matched against systemd/sysvinit service names.
    - pattern: "tomcat"

  ports:
  # List of TCP port numbers (integers) matched against listening ports.
  # Exact match only -- no ranges.
    - 8080
    - 8443

  filesystem:
  # List of glob patterns matched against discovered directory paths.
  # Supports * wildcard. Matched against /opt, /usr/local, /var/lib,
  # /home, /srv, /app, /u01 (configurable via scan_dirs).
    - "/opt/tomcat*"
    - "*/apache-tomcat-*"
    - "*/bin/catalina.sh"

# ==============================================================
# VERSION DETECTION
# ==============================================================

version_command: "{app_home}/bin/version.sh 2>/dev/null | grep 'Server version'"
# Shell command to determine the application version.
# {app_home} is replaced with the detected application home directory at runtime.
# Should output a single line containing the version string.
# Use 2>/dev/null to suppress errors, and always end with a fallback.

# ==============================================================
# CONFIGURATION FILES
# ==============================================================

config_files:
# List of configuration files to read when this application is detected.
  - name: "server_xml"
    # Identifier used as dictionary key in the report. Use snake_case.
    path: "{app_home}/conf/server.xml"
    # File path. {app_home} is replaced at runtime.

  - name: "tomcat_users_xml"
    path: "{app_home}/conf/tomcat-users.xml"
    sensitive: true
    # When true, passwords/secrets in this file are redacted with [REDACTED]
    # before being stored in the report.

  - name: "conf_dir"
    path: "{app_home}/conf.d/"
    directory: true
    # When true, lists directory contents instead of reading file content.

# ==============================================================
# HOME DIRECTORY DETECTION
# ==============================================================

home_detection:
  process_arg: "-Dcatalina.home="
  # JVM system property or command-line argument that contains the app home path.
  # Extracted from the process command line via grep.

  process_arg_alt: "-Dcatalina.base="
  # Fallback argument to try if process_arg yields nothing.

  default_home: "/opt/tomcat"
  # Static fallback path if neither process argument is found.

  default_home_alt: "/usr/share/tomcat"
  # Second static fallback path.

  env_vars:
  # Environment variables that may contain the home path.
  # Checked in /proc/PID/environ during deep inspection.
    - CATALINA_HOME
    - CATALINA_BASE

# ==============================================================
# DEEP INSPECTION FLAGS (optional)
# ==============================================================

war_introspection: true
# When true, the deep_inspect role will look for WAR/EAR files in the
# deployment directory and introspect each one (web.xml, pom.properties,
# MANIFEST.MF, WEB-INF/lib, Spring Boot detection, JPA detection).
# Capped at 20 deployments.

deployment_types:
# List of deployment artifact types this app server supports.
# Used to guide WAR/EAR introspection.
  - war
  - ear
  - jar

secrets_in_config:
# List of "filename:attribute" pairs that indicate where secrets live
# in this application's configuration files. The deep_inspect role will
# check if these exist and flag them in the report.
  - "server.xml:keystorePass"
  - "context.xml:password"

build_files:
# List of build file names to look for in the app home directory.
# When found, the deep_inspect role parses them for dependencies,
# frameworks, database drivers, and messaging libraries.
  - pom.xml
  - build.gradle
  - build.gradle.kts

# ==============================================================
# VERSION DETECTION METADATA (optional, used by deep_inspect)
# ==============================================================

version_detection:
  file: "registry.xml"
  # File relative to app_home that contains version information.
  alt_file: "lib/weblogic.jar"
  # Alternate location for version info.
  method: "manifest"
  # How to extract version: "manifest" (JAR MANIFEST.MF), "xml_element"
  # (XML tag), "properties" (Java properties file).

Example: adding an Apache Kafka detector

Create roles/detect_apps/files/detectors/kafka.yml:

name: Apache Kafka
id: kafka
category: middleware
detect:
  processes:
    - pattern: "kafka\\.Kafka"
    - pattern: "kafka-server-start"
  packages:
    - pattern: "kafka"
  services:
    - pattern: "kafka"
  ports:
    - 9092
    - 9093
  filesystem:
    - "/opt/kafka*"
    - "*/kafka/config"
version_command: "{app_home}/bin/kafka-server-start.sh --version 2>/dev/null | head -1"
config_files:
  - name: server_properties
    path: "{app_home}/config/server.properties"
  - name: log4j_properties
    path: "{app_home}/config/log4j.properties"
home_detection:
  process_arg: "-Dkafka.logs.dir="
  default_home: "/opt/kafka"

That is all. The next time the playbook runs, Kafka will be included in detection.

How Java Apps Are Configured in the Wild

Java application servers share common patterns that this collection captures:

JVM Configuration: Heap sizes (-Xms/-Xmx), garbage collector selection (-XX:+UseG1GC), system properties (-Djava.io.tmpdir), and Java agents (-javaagent:) are passed on the command line. These are critical for capacity planning during migration.

Configuration Files: Each app server has its own configuration layout (server.xml for Tomcat, standalone.xml for JBoss, config.xml for WebLogic) but they all define similar things: datasources, thread pools, security realms, and clustering. Passwords and connection strings live in these files.

Deployments: Applications are packaged as WAR (web) or EAR (enterprise) archives and dropped into a deployment directory. Each WAR contains WEB-INF/web.xml (servlet mappings), WEB-INF/lib/ (dependencies), and optionally Maven metadata and Spring Boot configuration.

Keystores: TLS certificates are stored in Java KeyStore (.jks) or PKCS12 (.p12) files. The keystore path and password are referenced in server configuration. Many production systems still use the default password changeit.

External Dependencies: Applications connect to databases (via JDBC), message queues (JMS/Kafka/RabbitMQ), and caches. These connections are defined in server config files or application properties and represent dependencies that must be available after migration.

Environment Injection: Configuration is often injected via setenv.sh (Tomcat), standalone.conf (JBoss), setDomainEnv.sh (WebLogic), or systemd unit Environment directives. These scripts set JAVA_HOME, JAVA_OPTS, CATALINA_OPTS, and custom properties.

MTA Integration Boundary

Each application in the report includes an mta_handoff section that clearly delineates responsibilities:

This collection discovers (infrastructure level):

Runtime JVM configuration (heap, GC, system properties, agents)
Listening ports and external network connections
Keystores with certificate details
JDBC datasource URLs and drivers
System packages (RPM/DEB)
Environment variables and setenv scripts
Cron jobs referencing the application
Log configuration and file paths
WAR/EAR contents (web.xml, dependencies, Maven coordinates)
Secrets in configuration files

MTA handles (code level):

Source code analysis and API compatibility
Deprecated API detection (javax -> jakarta, etc.)
Framework migration rules
Dockerfile and container manifest generation
Dependency vulnerability scanning
Code-level refactoring recommendations

Artifacts to pass to MTA:

WAR/EAR files listed in mta_handoff.artifacts_for_mta
Source repositories (if pom.xml/build.gradle found at build_info.build_files)
Configuration files listed in mta_handoff.config_for_mta

Example Output

{
  "host": "tomcat01.example.com",
  "scan_timestamp": "2026-04-10T14:30:00Z",
  "scan_version": "2.0.0",
  "scope": "java",
  "os": {
    "distribution": "Red Hat Enterprise Linux",
    "distribution_version": "8.9",
    "os_family": "RedHat",
    "kernel": "4.18.0-513.el8.x86_64",
    "architecture": "x86_64"
  },
  "hardware": {
    "vcpus": 4,
    "memory_mb": 16384,
    "swap_mb": 2048
  },
  "java_environments": [
    {
      "java_home": "/usr/lib/jvm/java-21-openjdk-21.0.10.0.7-1.el8.x86_64",
      "version": "21.0.10",
      "vendor": "Red Hat"
    },
    {
      "java_home": "/usr/lib/jvm/java-11-openjdk-11.0.25.0.9-3.el8.x86_64",
      "version": "11.0.25",
      "vendor": "Red Hat"
    }
  ],
  "discovered_applications": [
    {
      "name": "Apache Tomcat",
      "id": "tomcat",
      "category": "application_server",
      "version": "Server version: Apache Tomcat/9.0.93",
      "confidence": "high",
      "detection_methods": ["process", "service", "port", "filesystem"],
      "home_path": "/opt/tomcat",
      "jvm": {
        "java_version": "21.0.10",
        "java_home": "/usr/lib/jvm/java-21-openjdk",
        "java_vendor": "Red Hat",
        "heap_min": "512m",
        "heap_max": "2048m",
        "gc_algorithm": "G1GC",
        "system_properties": {
          "catalina.home": "/opt/tomcat",
          "catalina.base": "/opt/tomcat",
          "java.io.tmpdir": "/opt/tomcat/temp",
          "java.util.logging.config.file": "/opt/tomcat/conf/logging.properties"
        },
        "jvm_agents": [],
        "xx_flags": ["-XX:+UseG1GC", "-XX:MaxGCPauseMillis=200"]
      },
      "ports": [8080, 8443],
      "config_files": {
        "server_xml": {"path": "/opt/tomcat/conf/server.xml", "exists": true, "size": 7542},
        "context_xml": {"path": "/opt/tomcat/conf/context.xml", "exists": true, "size": 1234}
      },
      "deployments": [
        {
          "name": "myapp.war",
          "type": "war",
          "size_bytes": 45678912,
          "maven_coordinates": {
            "groupId": "com.example",
            "artifactId": "myapp",
            "version": "2.1.0"
          },
          "dependencies": [
            "spring-core-5.3.30.jar",
            "spring-web-5.3.30.jar",
            "hibernate-core-5.6.15.Final.jar",
            "postgresql-42.6.0.jar",
            "logback-classic-1.2.12.jar",
            "slf4j-api-1.7.36.jar"
          ],
          "web_xml": {
            "servlets": ["dispatcherServlet"],
            "filters": ["encodingFilter", "springSecurityFilterChain"],
            "listeners": ["org.springframework.web.context.ContextLoaderListener"]
          },
          "spring_boot": true,
          "jpa_configured": true
        }
      ],
      "jdbc_connections": [
        {
          "source_file": "/opt/tomcat/conf/context.xml",
          "url": "jdbc:postgresql://db.internal:5432/appdb",
          "host": "db.internal",
          "port": "5432",
          "database": "appdb",
          "driver": "org.postgresql.Driver"
        }
      ],
      "keystores": [
        {
          "path": "/opt/tomcat/conf/keystore.jks",
          "type": "JKS",
          "password_is_default": true,
          "aliases": ["tomcat"]
        }
      ],
      "external_connections": [
        {"remote_host": "db.internal", "remote_port": 5432, "protocol": "tcp"},
        {"remote_host": "redis.internal", "remote_port": 6379, "protocol": "tcp"}
      ],
      "log_config": {
        "type": "logback",
        "config_files": ["/opt/tomcat/webapps/myapp/WEB-INF/classes/logback-spring.xml"],
        "log_paths": ["/var/log/myapp/application.log"]
      },
      "system_packages": ["java-21-openjdk-21.0.10.0.7-1.el8.x86_64", "tomcat-native-1.2.39-1.el8.x86_64"],
      "environment_vars": {
        "CATALINA_HOME": "/opt/tomcat",
        "CATALINA_BASE": "/opt/tomcat",
        "JAVA_HOME": "/usr/lib/jvm/java-21-openjdk",
        "CATALINA_OPTS": "-Xms512m -Xmx2048m -XX:+UseG1GC"
      },
      "cron_jobs": [],
      "secrets_found": ["server.xml:keystorePass:/opt/tomcat/conf/server.xml"],
      "migration_flags": {
        "keystore_outside_app_home": false,
        "external_path_references": ["/opt/tomcat/temp", "/opt/tomcat/conf/logging.properties"],
        "session_persistence_configured": false,
        "clustering_configured": false,
        "jndi_datasources": 1,
        "custom_class_loader": false
      },
      "mta_handoff": {
        "analysis_target": "containerization",
        "artifacts_for_mta": ["myapp.war"],
        "config_for_mta": ["/opt/tomcat/conf/server.xml", "/opt/tomcat/conf/context.xml"],
        "this_collection_discovers": ["Runtime JVM configuration...", "..."],
        "mta_handles": ["Source code analysis...", "..."],
        "note": "MTA handles: dependency analysis, API compatibility, code-level migration issues. This collection handles: infrastructure-level discovery, runtime config, secrets, system packages."
      }
    }
  ],
  "summary": {
    "total_apps": 1,
    "high_confidence": 1,
    "medium_confidence": 0,
    "low_confidence": 0,
    "categories": {"application_server": 1},
    "total_java_environments": 2
  }
}

Roadmap

Future runtime support (not yet implemented):

Node.js / TypeScript applications
Python web applications (Django, Flask, FastAPI)
.NET / .NET Core applications
Go applications
Ruby on Rails

Design Decisions

Java-only scope -- focused depth over breadth. Deep JVM inspection, WAR introspection, and build analysis provide migration-critical data that shallow multi-runtime scanning cannot.
ignore_errors: true on all data-gathering tasks -- VMs in the wild have missing commands, restricted permissions, and non-standard layouts. The collection should always produce a report, even if partial.
ANSI stripping on all shell output -- terminals on Fedora/RHEL can inject escape codes into piped output via grep aliases.
Sensitive file redaction -- passwords and keys in config files are replaced with [REDACTED] before capture.
1MB cap on config file reads -- prevents memory issues with unexpectedly large files.
20 deployment cap -- WAR introspection is capped to prevent runaway execution on servers with many deployments.
Default keystore password probing -- tries changeit and changeme only. Never attempts brute force. Flags default passwords as a security finding.
No Tower/AAP dependency -- produces standalone JSON files that can be consumed by any downstream tool.
MTA boundary -- explicitly separates infrastructure discovery from code analysis to avoid duplicating MTA capabilities.

License

Apache-2.0

README.md Unescape Escape