Discover and explore top open-source AI tools and projects—updated daily.
zzyfightEvaluation benchmarks for generative AI in regulated industries
Top 36.7% on SourcePulse
Addresses the critical need for standardized, pre-deployment compliance evaluation of generative AI (GenAI) models in regulated sectors like financial services and telecommunications. It provides engineers and compliance officers with open-source benchmarks and tools to test LLM outputs against sector-specific regulatory requirements, mitigating risks before production deployment and avoiding costly internal, ad-hoc testing efforts.
How It Works
The project employs a Policy Engine that processes AI model outputs against sector-adaptive rules. An AI model's output is first analyzed by a Sector Detector to identify the relevant industry. The Policy Engine then loads domain-specific compliance rules and evaluates the output using a Compliance Evaluator, which generates graduated risk scores and regulation-specific reasoning rather than binary pass/fail results. An Explainer Module provides detailed justifications for identified violations, referencing specific regulations. A self-evolving Learner module accumulates risk features across evaluation cycles, enhancing the intelligence of the risk assessment over time.
Quick Start & Requirements
pip install genai-compliance-benchfrom genai_compliance_bench import PolicyEngine
engine = PolicyEngine()
engine.load_sector("financial")
result = engine.evaluate(
output="Based on the applicant's profile, we recommend denying the loan application.",
sector="financial",
context={"use_case": "credit_decisioning", "model": "gpt-4"},
)
print(f"Compliant: {result.passed}")
print(f"Risk score: {result.score:.2f}")
print(f"Violations: {len(result.violations)}")
for v in result.violations:
print(f" [{v.severity}] {v.rule_id}: {v.explanation}")
print(f" Regulation: {v.regulation_ref}")
Example Output:
Compliant: False
Risk score: 0.82
Violations: 2
[HIGH] ECOA-001: Credit decision output lacks required adverse action reasoning.
Regulation: ECOA / Regulation B, 12 CFR 1002.9
[MEDIUM] FAIR-002: Output does not reference specific, non-discriminatory factors.
Regulation: ECOA / Regulation B, 12 CFR 1002.6
Highlighted Details
Maintenance & Community
The repository includes a CONTRIBUTING.md file outlining development setup and contribution guidelines. No specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are mentioned in the README.
Licensing & Compatibility
Limitations & Caveats
The provided README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be focused on evaluation benchmarks rather than model training or deployment infrastructure.
2 weeks ago
Inactive
WarrenWen666